On Fri, Oct 12, 2012 at 11:13 AM, Guido van Rossum
[This is the first spin-off thread from "asyncore: included batteries don't fit"]
On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell
wrote: On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum
wrote: Re base reactor interface: drawing maximally from the lessons learned in twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, etc), asynchronous-looking name lookup, fd handling are the important parts.
That actually sounds more concrete than I'd like a reactor interface to be. In the App Engine world, there is a definite need for a reactor, but it cannot talk about file descriptors at all -- all I/O is defined in terms of RPC operations which have their own (several layers of) async management but still need to be plugged in to user code that might want to benefit from other reactor functionality such as scheduling and placing a call at a certain moment in the future.
So are you thinking of something like reactor.add_event_listener(event_type, event_params, func)? One thing to keep in mind is that file descriptors are somewhat special (at least in a level-triggered event loop), because of the way the event will keep firing until the socket buffer is drained or the event is unregistered. I'd be inclined to keep file descriptors in the interface even if they just raise an error on app engine, since they're fairly fundamental to the (unixy) event loop. On the other hand, I don't have any experience with event loops outside the unix/network world so I don't know what other systems might need for their event loops.
Hmm... This is definitely an interesting issue. I'm tempted to believe that it is *possible* to change every level-triggered setup into an edge-triggered setup by using an explicit loop -- but I'm not saying it is a good idea. In practice I think we need to support both equally well, so that the *app* can decide which paradigm to use. E.g. if I were to implement an HTTP server, I might use level-triggered for the "accept" call on the listening socket, but edge-triggered for everything else. OTOH someone else might prefer a buffered stream abstraction that just keeps filling its read buffer (and draining its write buffer) using level-triggered callbacks, at least up to a certain buffer size -- we have to be robust here and make it impossible for an evil client to fill up all our memory without our approval!
First of all, to clear up the terminology, edge-triggered actually has a specific meaning in this context that is separate from the question of whether callbacks are used more than once. The edge- vs level-triggered question is moot with one-shot callbacks, but when you're reusing callbacks in edge-triggered mode you won't get a second call until you've drained the socket buffer and then it becomes readable again. This turns out to be helpful for hybrid event/threaded systems, since the network thread may go into the next iteration of its loop while the worker thread is still consuming the data from a previous event. You can't always emulate edge-triggered behavior since it needs knowledge of internal socket buffers (epoll has an edge-triggered mode and I think kqueue does too, but you can't get edge-triggered behavior if you're falling back to select()). However, you can easily get one-shot callbacks from an event loop with persistent callbacks just by unregistering the callback once it has received an event. This has a performance cost, though - in tornado we try to avoid unnecessary unregister/register pairs.
I'm not at all familiar with the Twisted reactor interface. My own design would be along the following lines:
- There's an abstract Reactor class and an abstract Async I/O object class. To get a reactor to call you back, you must give it an I/O object, a callback, and maybe some more stuff. (I have gone back and like passing optional args for the callback, rather than requiring lambdas to create closures.) Note that the callback is *not* a designated method on the I/O object! In order to distinguish between edge-triggered and level-triggered, you just use a different reactor method. There could also be a reactor method to schedule a "bare" callback, either after some delay, or immediately (maybe with a given priority), although such functionality could also be implemented through magic I/O objects.
One reason to have a distinct method for running a bare callback is that you need to have some thread-safe entry point, but you otherwise don't really want locking on all the internal methods. Tornado's IOLoop.add_callback and Twisted's Reactor.callFromThread can be used to run code in the IOLoop's thread (which can then call the other IOLoop methods). We also have distinct methods for running a callback after a timeout, although if you had a variant of add_handler that didn't require a subsequent call to remove_handler you could probably do timeouts using a magical IO object. (an additional subtlety for the time-based methods is how time is computed. I recently added support in tornado to optionally use time.monotonic instead of time.time)
- In systems supporting file descriptors, there's a reactor implementation that knows how to use select/poll/etc., and there are concrete I/O object classes that wrap file descriptors. On Windows, those would only be socket file descriptors. On Unix, any file descriptor would do. To create such an I/O object you would use a platform-specific factory. There would be specialized factories to create e.g. listening sockets, connections, files, pipes, and so on.
Jython is another interesting case - it has a select() function that doesn't take integer file descriptors, just the opaque objects returned by socket.fileno(). While it's convenient to have higher-level constructors for various specialized types, I'd like to emphasize that having the low-level interface is important for interoperability. Tornado doesn't know whether the file descriptors are listening sockets, connected sockets, or pipes, so we'd just have to pass in a file descriptor with no other information.
- In systems like App Engine that don't support async I/O on file descriptors at all, the constructors for creating I/O objects for disk files and connection sockets would comply with the interface but fake out almost everything (just like today, using httplib or httplib2 on App Engine works by adapting them to a "urlfetch" RPC request).
Why would you be allowed to make IO objects for sockets that don't work? I would expect that to just raise an exception. On app engine RPCs would be the only supported async I/O objects (and timers, if those are implemented as magic I/O objects), and they're not implemented in terms of sockets or files. -Ben