The async API of the future: Reactors

[This is the first spin-off thread from "asyncore: included batteries don't fit"] On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell <ben@bendarnell.com> wrote:
Hmm... This is definitely an interesting issue. I'm tempted to believe that it is *possible* to change every level-triggered setup into an edge-triggered setup by using an explicit loop -- but I'm not saying it is a good idea. In practice I think we need to support both equally well, so that the *app* can decide which paradigm to use. E.g. if I were to implement an HTTP server, I might use level-triggered for the "accept" call on the listening socket, but edge-triggered for everything else. OTOH someone else might prefer a buffered stream abstraction that just keeps filling its read buffer (and draining its write buffer) using level-triggered callbacks, at least up to a certain buffer size -- we have to be robust here and make it impossible for an evil client to fill up all our memory without our approval! I'm not at all familiar with the Twisted reactor interface. My own design would be along the following lines: - There's an abstract Reactor class and an abstract Async I/O object class. To get a reactor to call you back, you must give it an I/O object, a callback, and maybe some more stuff. (I have gone back and like passing optional args for the callback, rather than requiring lambdas to create closures.) Note that the callback is *not* a designated method on the I/O object! In order to distinguish between edge-triggered and level-triggered, you just use a different reactor method. There could also be a reactor method to schedule a "bare" callback, either after some delay, or immediately (maybe with a given priority), although such functionality could also be implemented through magic I/O objects. - In systems supporting file descriptors, there's a reactor implementation that knows how to use select/poll/etc., and there are concrete I/O object classes that wrap file descriptors. On Windows, those would only be socket file descriptors. On Unix, any file descriptor would do. To create such an I/O object you would use a platform-specific factory. There would be specialized factories to create e.g. listening sockets, connections, files, pipes, and so on. - In systems like App Engine that don't support async I/O on file descriptors at all, the constructors for creating I/O objects for disk files and connection sockets would comply with the interface but fake out almost everything (just like today, using httplib or httplib2 on App Engine works by adapting them to a "urlfetch" RPC request).
That's a good point. I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it. Note that many GUI events would be level-triggered, but sometimes using the edge-triggered paradigm can work well too: e.g. I imagine that writing code to draw a curve following the mouse as long as a button is pressed might be conveniently written as a loop of the form def on_mouse_press(x, y, buttons): <set up polygon starting current x, y> while True: x, y, buttons = yield <get mouse event> if not buttons: break <extend polygon to x, y> <finish polygon> which itself is registered as a level-triggered handler for mouse presses. (Dealing with multiple buttons is left as an exercise. :-) -- --Guido van Rossum (python.org/~guido)

Hello Guido, On Fri, 12 Oct 2012 11:13:23 -0700 Guido van Rossum <guido@python.org> wrote:
I'd like to know what a sane buffered API for non-blocking I/O may look like, because right now it doesn't seem to make a lot of sense. At least this bug is tricky to resolve: http://bugs.python.org/issue13322
Why isn't it? In practice, you need several callbacks: in Twisted parlance, you have dataReceived but also e.g. ConnectionLost (depending on the transport, you may even imagine other callbacks, for example for things happening on the TLS layer?).
Windows *is* able to do async I/O on things other than sockets (see the discussion about IOCP). It's just that the Windows implementation of select() (the POSIX function call) is limited to sockets. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 12, 2012 at 11:33 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Good question. It actually depends quite a bit on whether you have an event loop or not -- with the help of an event loop, you can have a level-triggered callback that fills the buffer behind your back (up to a given limit, at which point it should unregister the I/O object); that bug seems to be about a situation without an event loop, where you can't do that. Also the existing io module design never anticipated cooperation with an event loop.
Yes, but I really want to separate the callbacks from the object, so that I don't have to inherit from an I/O object class -- asyncore requires this and IMO it's wrong. It also makes it harder to use the same callback code with different types of I/O objects.
I know, but IOCP is currently not supported in the stdlib. I expect that on Windows, to use IOCP, you'd need to use a different reactor implementation and a different I/O object than the vanilla fd-based ones. My design is actually *inspired* by the desire to support this cleanly. -- --Guido van Rossum (python.org/~guido)

[Responding to a different message that also pertains to the reactors thread] On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam <dreamingforward@gmail.com> wrote:
I'm convinced that the OS has to get involved. I'm not convinced that it will get in the way of designing an abstract unified API -- however that API will have to be more complicated than the kind of event loop that *only* handles network I/O or the kind that *only* handles GUI events. I wonder if Windows' IOCP API that was mentioned before in the parent thread wouldn't be able to handle both though. Windows' event concept seems more general than sockets or GUI events. However I don't know if this is actually how GUI events are handled in Windows.
You should talk to a Tcl/Tk user (if there are any left :-).
I used to be one of those :)
So tell us more about the user experience of having a standard event loop always available in the language, and threads, network I/O and GUI events all integrated. What worked, what didn't? What did you wish had been different? -- --Guido van Rossum (python.org/~guido)

[Responding to yet another message in the original thread] On Thu, Oct 11, 2012 at 9:45 PM, Trent Nelson <trent@snakebite.org> wrote:
Would you really win anything by doing I/O in separate threads, while doing normal request processing in the main thread?
That said, the idea of a common API architected around async I/O, rather than non-blocking I/O, sounds interesting at least theoretically.
(Oh, what a nice distinction.)
In which category does OS X fall?
How close would our abstracted reactor interface have to be exactly like IOCP? The actual IOCP API calls have very little to recommend them -- it's the implementation and the architecture that we're after. But we want it to be able to use actual IOCP calls on all systems that have them.
Maybe all those outdated Snakebite Operating Systems are useful for something after all. ;-P
-- --Guido van Rossum (python.org/~guido)

On Fri, Oct 12, 2012 at 03:49:36PM -0700, Guido van Rossum wrote:
Oh, how'd I forget about OS X! At the worst, it falls into the FreeBSD kqueue camp, having both a) kqueue and b) a performant pthread implementation. However, with the recent advent of Grand Central Dispatch, it's actually on par with Windows' IOCP+threadpool offerings, which is pretty cool. (And apparently there are GCD ports in the works for Solaris, Linux and... Windows?!) Will reply to the other questions in a separate response. Trent.

On Fri, 12 Oct 2012 21:11:20 -0400 Trent Nelson <trent@snakebite.org> wrote:
The port already exists for FreeBSD. As of 8.1, the kernel has enhanced kqueue support for it, and devel/libdispatch installs the GCD code. I'd be surprised if the other *BSD's haven't picked it up yet. All of which makes me think that an async library based on GCD and maybe IOCP for Windows if it's not available there would be reasonably portable. A standard Python library that made this as nice to use as it is from MacRuby would be a good thing. You can find jkh (ex FreeBSD RE, now running the OS X systems group for Apple) discussing Python and GCD here: http://stackoverflow.com/questions/7955630/parallel-processing-in-python-a-l... <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

On 12/10/2012 11:49pm, Guido van Rossum wrote:
One could use IOCP or select/poll/... to implement an API which looks like class AsyncHub: def read(self, fd, nbytes): """Return future which is ready when read is complete""" def write(self, fd, buf): """Return future which is ready when write is complete""" def accept(self, fd): """Return future which is ready when connection is accepted""" def connect(self, fd, address): """Return future which is ready when connection has succeeded""" def wait(self, timeout=None): """Wait till a future is ready; return list of ready futures""" A reactor could then be built on top of such a hub. -- Richard

On Oct 15, 2012, at 3:11 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
So in general alle methods are async, even the wait() could be async if it returned a Furure, this way all methods would be of the same concept. I like this as a general API for all types of connections and all underlying OS' /Rene

On Fri, Oct 12, 2012 at 3:43 PM, Guido van Rossum <guido@python.org> wrote:
Yes, however, as suggested in my other message, there are three desires: {"cross-platform (OS) portability", "speed", "unified API"}, but you can only pick two. One of these has to be sacrificed because there are users for all of those. I think such a decision must be "deferred() "to some "Future(Python4000)" in order to succeed at making "Grand Unified Theory" for hardware/OS/python synchronization. (For the record, I do think it is possible, and indeed that is exactly what I'm working on. To make it work will require a compelling, unified object model, forwarding the art of Computer Science...) markj

On Fri, Oct 12, 2012 at 8:06 PM, Mike Graham <mikegraham@gmail.com> wrote:
...several **systems**? i mean, you can accomplish such a task on a *particular* O.S. but I don't know where this is the case across *several* systems (Unix, Mac, and Windows). I would like to know of an example, if you have one? mark

On Fri, Oct 12, 2012 at 3:32 PM, Guido van Rossum <guido@python.org> wrote:
Why is subclassing a problem? It can be overused, but seems the right thing to do in this case. You want a protocol that responds to new data by echoing and tells the user when the connection was terminated? It makes sense that this is a subclass: a special case of some class that handles the base behavior. What if this was just an optional way and we could also provide a helper to attach handlers to the base class instance without subclassing it? The function registering it could take keyword arguments mapping additional event->callbacks to the object.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman <ironfroggy@gmail.com> wrote:
I replied to this in detail on the "Twisted and Deferreds" thread in an exchange. Summary: I'm -0 when it comes to subclassing protocol classes; -1 on subclassing objects that implement significant functionality.
Yeah, there are many APIs that we could offer. We just have to offer one that's general enough so that people who prefer other styles can implement their preferred style in a library. -- --Guido van Rossum (python.org/~guido)

On Mon, Oct 15, 2012 at 11:33 AM, Guido van Rossum <guido@python.org> wrote:
But you're still stuck with implementing the names that someone else decided upon a decade ago... :-)
And why is that a bad thing? I don't see the value in having something like: thing.set_data_received_callback(self.bake_some_eggs) We're going to have to give *something* a name, eventually. Why not pick it at the most direct level?
-- Jasper

On Mon, Oct 15, 2012 at 8:39 AM, Jasper St. Pierre <jstpierre@mecheye.net> wrote:
But I do, and you've pinpointed exactly my argument. My code is all about baking an egg, and (from my POV) it's secondary that it's invoked by the reactor when data is received.
We're going to have to give *something* a name, eventually. Why not pick it at the most direct level?
Let the reactor pick *its* names (e.g. set_data_received_callback). Then I can pick mine. -- --Guido van Rossum (python.org/~guido)

On Tue, Oct 16, 2012 at 1:33 AM, Guido van Rossum <guido@python.org> wrote:
But you're still stuck with implementing the names that someone else decided upon a decade ago... :-)
There's a certain benefit to everyone using the same names and being able to read each others code, even when there's a (small?) risk of the names not aging well. Do we really want the first step in deciphering someone else's async code to be "OK, what did they call their connection and data processing callbacks?"? Twisted's IProtocol API is pretty simple: - makeConnection - connectionMade - dataReceived - connectionLost Everything else is up to the individual protocols (including whether or not they offer a "write" method) The transport and producer/consumer APIs aren't much more complicated (https://twistedmatrix.com/documents/current/core/howto/producers.html) and make rather a lot of sense. The precise *shape* of those APIs are likely to be different in a generator based system, and I assume we'd want to lose the camel-case names, but standardising the terminology seems like a good idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 15, 2012 at 5:56 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
While I agree with everything else you're saying, write may be a bad example: it's generally something on the *transport*, and it's an interface method (ie always available) there.
Cheers, Nick.
-- cheers lvh

On Mon, Oct 15, 2012 at 8:56 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I guess you see it as a template pattern, where everybody has to implement the same state machine *somehow*. Like having to implement a file-like object, or a mapping. I'm still convinced that the alternate POV is just as valid in this case, but I'm going to let it rest because it doesn't matter enough to me to keep arguing. -- --Guido van Rossum (python.org/~guido)

Antoine Pitrou wrote:
One reason might be that it more or less forces you to subclass the I/O object, instead of just using one of a few predefined ones for file, socket, etc. Although this could be ameliorated by giving the standard I/O objects the ability to have callbacks plugged into them. Then you could use whichever style was most convenient. -- Greg

Guido van Rossum wrote:
- There's an abstract Reactor class and an abstract Async I/O object class.
Can we please use a better term than "reactor" for this? Its meaning is only obvious to someone familiar with Twisted. Not being such a person, it's taken me a while to figure out from this discussion that it refers to the central object implementing the event loop, and not one of the user-supplied objects that could equally well be described as "reacting" to events. Something like "dispatcher" would be clearer, IMO. -- Greg

On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Sorry about that. I'm afraid it's too late for this thread's subject line, but I will try to make sure that if and when this makes it into the standard library it'll have a more appropriate name. I would recommend event loop (which is the name I naturally would give it when asked out of context) or I/O loop, which is what Tornado apparently used. Dispatcher would not be my first choice. FWIW, it's not a completely Twisted-specific term: http://en.wikipedia.org/wiki/Reactor_pattern -- --Guido van Rossum (python.org/~guido)

On 10/12/2012 8:26 PM, Guido van Rossum wrote:
On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Thanks for the clarification. Reactors react to events within an event loop* by dispatching them to handlers. Correct? *Iteration rather than recursion is required because they continue the cycle indefinitely. I am still fuzzy on edge-triggered versus level triggered in this context, as opposed to electronics. -- Terry Jan Reedy

On Fri, Oct 12, 2012 at 11:13 AM, Guido van Rossum <guido@python.org> wrote:
First of all, to clear up the terminology, edge-triggered actually has a specific meaning in this context that is separate from the question of whether callbacks are used more than once. The edge- vs level-triggered question is moot with one-shot callbacks, but when you're reusing callbacks in edge-triggered mode you won't get a second call until you've drained the socket buffer and then it becomes readable again. This turns out to be helpful for hybrid event/threaded systems, since the network thread may go into the next iteration of its loop while the worker thread is still consuming the data from a previous event. You can't always emulate edge-triggered behavior since it needs knowledge of internal socket buffers (epoll has an edge-triggered mode and I think kqueue does too, but you can't get edge-triggered behavior if you're falling back to select()). However, you can easily get one-shot callbacks from an event loop with persistent callbacks just by unregistering the callback once it has received an event. This has a performance cost, though - in tornado we try to avoid unnecessary unregister/register pairs.
One reason to have a distinct method for running a bare callback is that you need to have some thread-safe entry point, but you otherwise don't really want locking on all the internal methods. Tornado's IOLoop.add_callback and Twisted's Reactor.callFromThread can be used to run code in the IOLoop's thread (which can then call the other IOLoop methods). We also have distinct methods for running a callback after a timeout, although if you had a variant of add_handler that didn't require a subsequent call to remove_handler you could probably do timeouts using a magical IO object. (an additional subtlety for the time-based methods is how time is computed. I recently added support in tornado to optionally use time.monotonic instead of time.time)
Jython is another interesting case - it has a select() function that doesn't take integer file descriptors, just the opaque objects returned by socket.fileno(). While it's convenient to have higher-level constructors for various specialized types, I'd like to emphasize that having the low-level interface is important for interoperability. Tornado doesn't know whether the file descriptors are listening sockets, connected sockets, or pipes, so we'd just have to pass in a file descriptor with no other information.
Why would you be allowed to make IO objects for sockets that don't work? I would expect that to just raise an exception. On app engine RPCs would be the only supported async I/O objects (and timers, if those are implemented as magic I/O objects), and they're not implemented in terms of sockets or files. -Ben

On Fri, Oct 12, 2012 at 9:52 PM, Ben Darnell <ben@bendarnell.com> wrote:
Yeah, sorry for contributing to the confusion here! Glyph cleared it up for me.
We should do be careful to support all this in our event loop design, without necessarily offering two ways of doing everything -- the event loop should be at liberty to use the most efficient strategy for the platform. (If that depends on what sort of I/O the user is interested in, we should be sure that that information reaches the event loop too. I like the idea more and more of an IO object that encapsulates a socket or other event source, using predefined subclasses for each type that is relevant to the platform.
That's an important use case to support.
Interesting.
Yeah, the IO object will still need to have a fileno() method.
Here's my use case. Suppose in general one can use async I/O for disk files, and it is integrated with the standard (abstract) event loop. So someone writes a handy templating library that wants to play nice with async apps, so it uses the async I/O idiom to read e.g. the template source code. Support I want to use that library on App Engine. It would be a pain if I had to modify that template-reading code to not use the async API. But (given the right async API!) it would be pretty simple for the App Engine API to provide a mock implementation of the async file reading API that was synchronous under the hood. Yes, it would block while waiting for disk, but App Engine uses threads anyway so it wouldn't be a problem. Another, current-day, use case is the httplib interface in the stdlib (a fairly fancy HTTP/1.1 client, although it has its flaws). That's based on sockets, which App Engine doesn't have; we have a "urlfetch" RPC that you give a URL (and more optional stuff) and returns a record containing the contents and headers. But again, many useful 3rd party libraries use httplib, and they won't work unless we somehow support httplib. So we have had to go out of our way to cover most uses of httplib. While the app believes it is opening the connection and sending the request, we are actually just buffering everything; and when the app starts reading from the connection, we make the urlfetch RPC and buffer the response, which we then feed back to the app as it believes it is reading from the socket. As long as the app doesn't try to get the socket's file descriptor and call select() it will work fine. But some libraries *do* call select(), and here our emulation breaks down. It would be nicer if the standard way to do async stuff was higher level than select(), so that we could offer the emulation at a level that would integrate with the event loop -- that way, ideally when we have to send the urlfetch RPC we could actually return a Future (or whatever), and the task would correctly be suspended, just *thinking* it was waiting for the response on a socket, but actually waiting for the RPC. Hopefully SSL provides another use case. -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 14, 2012 at 10:15 AM, Guido van Rossum <guido@python.org> wrote:
They also need to be constructible given nothing but a fileno (but more on this later)
Understood.
Hopefully SSL provides another use case.
In posix-land, SSL isn't that different from regular sockets (using ssl.wrap_socket from the 2.6+ stdlib). The connection process is a little more complicated, and it gets hairy if you want to support renegotiation, but once a connection is established you can select() on its file descriptor and generally use it just like a regular socket. On IOCP it's another story, though. I've finally gotten around to reading up on IOCP and see how it's so different from everything I'm used to (a lot of Twisted's design decisions at the reactor level make a lot more sense now). Earlier you had mentioned platform-specific constructors for IOObjects, but it actually needs to be event-loop-specific: On windows you can use select() or IOCP, and the IOObjects would be completely different for each of them (and I do think you need to support both - select() is kind of a second-class citizen on windows but is useful due to its ubiquity). This means that the event loop needs to be involved in the creation of these objects, which is why twisted has connectTCP, listenTCP, listenUDP, connectSSL, etc methods on the reactor interface. I think that in order to handle both IOCP and select-style event loops you'll need a very broad interface (roughly the union of twisted's IReactor{Core, Time, Thread, TCP, UDP, SSL} as a minimum, with IReactorFDSet and maybe IReactorSocket on posix for compatible with existing posixy practices). Basically, an event loop that supports IOCP (or hopes to support it in the future) will end up looking a lot like the bottom couple of layers of twisted (and assuming IOCP is a requirement I wouldn't want to stray too far from twisted's designs here). -Ben

Hello Guido, On Fri, 12 Oct 2012 11:13:23 -0700 Guido van Rossum <guido@python.org> wrote:
I'd like to know what a sane buffered API for non-blocking I/O may look like, because right now it doesn't seem to make a lot of sense. At least this bug is tricky to resolve: http://bugs.python.org/issue13322
Why isn't it? In practice, you need several callbacks: in Twisted parlance, you have dataReceived but also e.g. ConnectionLost (depending on the transport, you may even imagine other callbacks, for example for things happening on the TLS layer?).
Windows *is* able to do async I/O on things other than sockets (see the discussion about IOCP). It's just that the Windows implementation of select() (the POSIX function call) is limited to sockets. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 12, 2012 at 11:33 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Good question. It actually depends quite a bit on whether you have an event loop or not -- with the help of an event loop, you can have a level-triggered callback that fills the buffer behind your back (up to a given limit, at which point it should unregister the I/O object); that bug seems to be about a situation without an event loop, where you can't do that. Also the existing io module design never anticipated cooperation with an event loop.
Yes, but I really want to separate the callbacks from the object, so that I don't have to inherit from an I/O object class -- asyncore requires this and IMO it's wrong. It also makes it harder to use the same callback code with different types of I/O objects.
I know, but IOCP is currently not supported in the stdlib. I expect that on Windows, to use IOCP, you'd need to use a different reactor implementation and a different I/O object than the vanilla fd-based ones. My design is actually *inspired* by the desire to support this cleanly. -- --Guido van Rossum (python.org/~guido)

[Responding to a different message that also pertains to the reactors thread] On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam <dreamingforward@gmail.com> wrote:
I'm convinced that the OS has to get involved. I'm not convinced that it will get in the way of designing an abstract unified API -- however that API will have to be more complicated than the kind of event loop that *only* handles network I/O or the kind that *only* handles GUI events. I wonder if Windows' IOCP API that was mentioned before in the parent thread wouldn't be able to handle both though. Windows' event concept seems more general than sockets or GUI events. However I don't know if this is actually how GUI events are handled in Windows.
You should talk to a Tcl/Tk user (if there are any left :-).
I used to be one of those :)
So tell us more about the user experience of having a standard event loop always available in the language, and threads, network I/O and GUI events all integrated. What worked, what didn't? What did you wish had been different? -- --Guido van Rossum (python.org/~guido)

[Responding to yet another message in the original thread] On Thu, Oct 11, 2012 at 9:45 PM, Trent Nelson <trent@snakebite.org> wrote:
Would you really win anything by doing I/O in separate threads, while doing normal request processing in the main thread?
That said, the idea of a common API architected around async I/O, rather than non-blocking I/O, sounds interesting at least theoretically.
(Oh, what a nice distinction.)
In which category does OS X fall?
How close would our abstracted reactor interface have to be exactly like IOCP? The actual IOCP API calls have very little to recommend them -- it's the implementation and the architecture that we're after. But we want it to be able to use actual IOCP calls on all systems that have them.
Maybe all those outdated Snakebite Operating Systems are useful for something after all. ;-P
-- --Guido van Rossum (python.org/~guido)

On Fri, Oct 12, 2012 at 03:49:36PM -0700, Guido van Rossum wrote:
Oh, how'd I forget about OS X! At the worst, it falls into the FreeBSD kqueue camp, having both a) kqueue and b) a performant pthread implementation. However, with the recent advent of Grand Central Dispatch, it's actually on par with Windows' IOCP+threadpool offerings, which is pretty cool. (And apparently there are GCD ports in the works for Solaris, Linux and... Windows?!) Will reply to the other questions in a separate response. Trent.

On Fri, 12 Oct 2012 21:11:20 -0400 Trent Nelson <trent@snakebite.org> wrote:
The port already exists for FreeBSD. As of 8.1, the kernel has enhanced kqueue support for it, and devel/libdispatch installs the GCD code. I'd be surprised if the other *BSD's haven't picked it up yet. All of which makes me think that an async library based on GCD and maybe IOCP for Windows if it's not available there would be reasonably portable. A standard Python library that made this as nice to use as it is from MacRuby would be a good thing. You can find jkh (ex FreeBSD RE, now running the OS X systems group for Apple) discussing Python and GCD here: http://stackoverflow.com/questions/7955630/parallel-processing-in-python-a-l... <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

On 12/10/2012 11:49pm, Guido van Rossum wrote:
One could use IOCP or select/poll/... to implement an API which looks like class AsyncHub: def read(self, fd, nbytes): """Return future which is ready when read is complete""" def write(self, fd, buf): """Return future which is ready when write is complete""" def accept(self, fd): """Return future which is ready when connection is accepted""" def connect(self, fd, address): """Return future which is ready when connection has succeeded""" def wait(self, timeout=None): """Wait till a future is ready; return list of ready futures""" A reactor could then be built on top of such a hub. -- Richard

On Oct 15, 2012, at 3:11 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
So in general alle methods are async, even the wait() could be async if it returned a Furure, this way all methods would be of the same concept. I like this as a general API for all types of connections and all underlying OS' /Rene

On Fri, Oct 12, 2012 at 3:43 PM, Guido van Rossum <guido@python.org> wrote:
Yes, however, as suggested in my other message, there are three desires: {"cross-platform (OS) portability", "speed", "unified API"}, but you can only pick two. One of these has to be sacrificed because there are users for all of those. I think such a decision must be "deferred() "to some "Future(Python4000)" in order to succeed at making "Grand Unified Theory" for hardware/OS/python synchronization. (For the record, I do think it is possible, and indeed that is exactly what I'm working on. To make it work will require a compelling, unified object model, forwarding the art of Computer Science...) markj

On Fri, Oct 12, 2012 at 8:06 PM, Mike Graham <mikegraham@gmail.com> wrote:
...several **systems**? i mean, you can accomplish such a task on a *particular* O.S. but I don't know where this is the case across *several* systems (Unix, Mac, and Windows). I would like to know of an example, if you have one? mark

On Fri, Oct 12, 2012 at 3:32 PM, Guido van Rossum <guido@python.org> wrote:
Why is subclassing a problem? It can be overused, but seems the right thing to do in this case. You want a protocol that responds to new data by echoing and tells the user when the connection was terminated? It makes sense that this is a subclass: a special case of some class that handles the base behavior. What if this was just an optional way and we could also provide a helper to attach handlers to the base class instance without subclassing it? The function registering it could take keyword arguments mapping additional event->callbacks to the object.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman <ironfroggy@gmail.com> wrote:
I replied to this in detail on the "Twisted and Deferreds" thread in an exchange. Summary: I'm -0 when it comes to subclassing protocol classes; -1 on subclassing objects that implement significant functionality.
Yeah, there are many APIs that we could offer. We just have to offer one that's general enough so that people who prefer other styles can implement their preferred style in a library. -- --Guido van Rossum (python.org/~guido)

On Mon, Oct 15, 2012 at 11:33 AM, Guido van Rossum <guido@python.org> wrote:
But you're still stuck with implementing the names that someone else decided upon a decade ago... :-)
And why is that a bad thing? I don't see the value in having something like: thing.set_data_received_callback(self.bake_some_eggs) We're going to have to give *something* a name, eventually. Why not pick it at the most direct level?
-- Jasper

On Mon, Oct 15, 2012 at 8:39 AM, Jasper St. Pierre <jstpierre@mecheye.net> wrote:
But I do, and you've pinpointed exactly my argument. My code is all about baking an egg, and (from my POV) it's secondary that it's invoked by the reactor when data is received.
We're going to have to give *something* a name, eventually. Why not pick it at the most direct level?
Let the reactor pick *its* names (e.g. set_data_received_callback). Then I can pick mine. -- --Guido van Rossum (python.org/~guido)

On Tue, Oct 16, 2012 at 1:33 AM, Guido van Rossum <guido@python.org> wrote:
But you're still stuck with implementing the names that someone else decided upon a decade ago... :-)
There's a certain benefit to everyone using the same names and being able to read each others code, even when there's a (small?) risk of the names not aging well. Do we really want the first step in deciphering someone else's async code to be "OK, what did they call their connection and data processing callbacks?"? Twisted's IProtocol API is pretty simple: - makeConnection - connectionMade - dataReceived - connectionLost Everything else is up to the individual protocols (including whether or not they offer a "write" method) The transport and producer/consumer APIs aren't much more complicated (https://twistedmatrix.com/documents/current/core/howto/producers.html) and make rather a lot of sense. The precise *shape* of those APIs are likely to be different in a generator based system, and I assume we'd want to lose the camel-case names, but standardising the terminology seems like a good idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 15, 2012 at 5:56 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
While I agree with everything else you're saying, write may be a bad example: it's generally something on the *transport*, and it's an interface method (ie always available) there.
Cheers, Nick.
-- cheers lvh

On Mon, Oct 15, 2012 at 8:56 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I guess you see it as a template pattern, where everybody has to implement the same state machine *somehow*. Like having to implement a file-like object, or a mapping. I'm still convinced that the alternate POV is just as valid in this case, but I'm going to let it rest because it doesn't matter enough to me to keep arguing. -- --Guido van Rossum (python.org/~guido)

Antoine Pitrou wrote:
One reason might be that it more or less forces you to subclass the I/O object, instead of just using one of a few predefined ones for file, socket, etc. Although this could be ameliorated by giving the standard I/O objects the ability to have callbacks plugged into them. Then you could use whichever style was most convenient. -- Greg

Guido van Rossum wrote:
- There's an abstract Reactor class and an abstract Async I/O object class.
Can we please use a better term than "reactor" for this? Its meaning is only obvious to someone familiar with Twisted. Not being such a person, it's taken me a while to figure out from this discussion that it refers to the central object implementing the event loop, and not one of the user-supplied objects that could equally well be described as "reacting" to events. Something like "dispatcher" would be clearer, IMO. -- Greg

On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Sorry about that. I'm afraid it's too late for this thread's subject line, but I will try to make sure that if and when this makes it into the standard library it'll have a more appropriate name. I would recommend event loop (which is the name I naturally would give it when asked out of context) or I/O loop, which is what Tornado apparently used. Dispatcher would not be my first choice. FWIW, it's not a completely Twisted-specific term: http://en.wikipedia.org/wiki/Reactor_pattern -- --Guido van Rossum (python.org/~guido)

On 10/12/2012 8:26 PM, Guido van Rossum wrote:
On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Thanks for the clarification. Reactors react to events within an event loop* by dispatching them to handlers. Correct? *Iteration rather than recursion is required because they continue the cycle indefinitely. I am still fuzzy on edge-triggered versus level triggered in this context, as opposed to electronics. -- Terry Jan Reedy

On Fri, Oct 12, 2012 at 11:13 AM, Guido van Rossum <guido@python.org> wrote:
First of all, to clear up the terminology, edge-triggered actually has a specific meaning in this context that is separate from the question of whether callbacks are used more than once. The edge- vs level-triggered question is moot with one-shot callbacks, but when you're reusing callbacks in edge-triggered mode you won't get a second call until you've drained the socket buffer and then it becomes readable again. This turns out to be helpful for hybrid event/threaded systems, since the network thread may go into the next iteration of its loop while the worker thread is still consuming the data from a previous event. You can't always emulate edge-triggered behavior since it needs knowledge of internal socket buffers (epoll has an edge-triggered mode and I think kqueue does too, but you can't get edge-triggered behavior if you're falling back to select()). However, you can easily get one-shot callbacks from an event loop with persistent callbacks just by unregistering the callback once it has received an event. This has a performance cost, though - in tornado we try to avoid unnecessary unregister/register pairs.
One reason to have a distinct method for running a bare callback is that you need to have some thread-safe entry point, but you otherwise don't really want locking on all the internal methods. Tornado's IOLoop.add_callback and Twisted's Reactor.callFromThread can be used to run code in the IOLoop's thread (which can then call the other IOLoop methods). We also have distinct methods for running a callback after a timeout, although if you had a variant of add_handler that didn't require a subsequent call to remove_handler you could probably do timeouts using a magical IO object. (an additional subtlety for the time-based methods is how time is computed. I recently added support in tornado to optionally use time.monotonic instead of time.time)
Jython is another interesting case - it has a select() function that doesn't take integer file descriptors, just the opaque objects returned by socket.fileno(). While it's convenient to have higher-level constructors for various specialized types, I'd like to emphasize that having the low-level interface is important for interoperability. Tornado doesn't know whether the file descriptors are listening sockets, connected sockets, or pipes, so we'd just have to pass in a file descriptor with no other information.
Why would you be allowed to make IO objects for sockets that don't work? I would expect that to just raise an exception. On app engine RPCs would be the only supported async I/O objects (and timers, if those are implemented as magic I/O objects), and they're not implemented in terms of sockets or files. -Ben

On Fri, Oct 12, 2012 at 9:52 PM, Ben Darnell <ben@bendarnell.com> wrote:
Yeah, sorry for contributing to the confusion here! Glyph cleared it up for me.
We should do be careful to support all this in our event loop design, without necessarily offering two ways of doing everything -- the event loop should be at liberty to use the most efficient strategy for the platform. (If that depends on what sort of I/O the user is interested in, we should be sure that that information reaches the event loop too. I like the idea more and more of an IO object that encapsulates a socket or other event source, using predefined subclasses for each type that is relevant to the platform.
That's an important use case to support.
Interesting.
Yeah, the IO object will still need to have a fileno() method.
Here's my use case. Suppose in general one can use async I/O for disk files, and it is integrated with the standard (abstract) event loop. So someone writes a handy templating library that wants to play nice with async apps, so it uses the async I/O idiom to read e.g. the template source code. Support I want to use that library on App Engine. It would be a pain if I had to modify that template-reading code to not use the async API. But (given the right async API!) it would be pretty simple for the App Engine API to provide a mock implementation of the async file reading API that was synchronous under the hood. Yes, it would block while waiting for disk, but App Engine uses threads anyway so it wouldn't be a problem. Another, current-day, use case is the httplib interface in the stdlib (a fairly fancy HTTP/1.1 client, although it has its flaws). That's based on sockets, which App Engine doesn't have; we have a "urlfetch" RPC that you give a URL (and more optional stuff) and returns a record containing the contents and headers. But again, many useful 3rd party libraries use httplib, and they won't work unless we somehow support httplib. So we have had to go out of our way to cover most uses of httplib. While the app believes it is opening the connection and sending the request, we are actually just buffering everything; and when the app starts reading from the connection, we make the urlfetch RPC and buffer the response, which we then feed back to the app as it believes it is reading from the socket. As long as the app doesn't try to get the socket's file descriptor and call select() it will work fine. But some libraries *do* call select(), and here our emulation breaks down. It would be nicer if the standard way to do async stuff was higher level than select(), so that we could offer the emulation at a level that would integrate with the event loop -- that way, ideally when we have to send the urlfetch RPC we could actually return a Future (or whatever), and the task would correctly be suspended, just *thinking* it was waiting for the response on a socket, but actually waiting for the RPC. Hopefully SSL provides another use case. -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 14, 2012 at 10:15 AM, Guido van Rossum <guido@python.org> wrote:
They also need to be constructible given nothing but a fileno (but more on this later)
Understood.
Hopefully SSL provides another use case.
In posix-land, SSL isn't that different from regular sockets (using ssl.wrap_socket from the 2.6+ stdlib). The connection process is a little more complicated, and it gets hairy if you want to support renegotiation, but once a connection is established you can select() on its file descriptor and generally use it just like a regular socket. On IOCP it's another story, though. I've finally gotten around to reading up on IOCP and see how it's so different from everything I'm used to (a lot of Twisted's design decisions at the reactor level make a lot more sense now). Earlier you had mentioned platform-specific constructors for IOObjects, but it actually needs to be event-loop-specific: On windows you can use select() or IOCP, and the IOObjects would be completely different for each of them (and I do think you need to support both - select() is kind of a second-class citizen on windows but is useful due to its ubiquity). This means that the event loop needs to be involved in the creation of these objects, which is why twisted has connectTCP, listenTCP, listenUDP, connectSSL, etc methods on the reactor interface. I think that in order to handle both IOCP and select-style event loops you'll need a very broad interface (roughly the union of twisted's IReactor{Core, Time, Thread, TCP, UDP, SSL} as a minimum, with IReactorFDSet and maybe IReactorSocket on posix for compatible with existing posixy practices). Basically, an event loop that supports IOCP (or hopes to support it in the future) will end up looking a lot like the bottom couple of layers of twisted (and assuming IOCP is a requirement I wouldn't want to stray too far from twisted's designs here). -Ben
participants (15)
-
Antoine Pitrou
-
Ben Darnell
-
Calvin Spealman
-
Greg Ewing
-
Guido van Rossum
-
Jasper St. Pierre
-
Laurens Van Houtven
-
Mark Adam
-
Mike Graham
-
Mike Meyer
-
Nick Coghlan
-
Rene Nejsum
-
Richard Oudkerk
-
Terry Reedy
-
Trent Nelson