On Fri, Oct 12, 2012 at 9:52 PM, Ben Darnell ben@bendarnell.com wrote:
First of all, to clear up the terminology, edge-triggered actually has a specific meaning in this context that is separate from the question of whether callbacks are used more than once. The edge- vs level-triggered question is moot with one-shot callbacks, but when you're reusing callbacks in edge-triggered mode you won't get a second call until you've drained the socket buffer and then it becomes readable again. This turns out to be helpful for hybrid event/threaded systems, since the network thread may go into the next iteration of its loop while the worker thread is still consuming the data from a previous event.
Yeah, sorry for contributing to the confusion here! Glyph cleared it up for me.
You can't always emulate edge-triggered behavior since it needs knowledge of internal socket buffers (epoll has an edge-triggered mode and I think kqueue does too, but you can't get edge-triggered behavior if you're falling back to select()). However, you can easily get one-shot callbacks from an event loop with persistent callbacks just by unregistering the callback once it has received an event. This has a performance cost, though - in tornado we try to avoid unnecessary unregister/register pairs.
We should do be careful to support all this in our event loop design, without necessarily offering two ways of doing everything -- the event loop should be at liberty to use the most efficient strategy for the platform. (If that depends on what sort of I/O the user is interested in, we should be sure that that information reaches the event loop too. I like the idea more and more of an IO object that encapsulates a socket or other event source, using predefined subclasses for each type that is relevant to the platform.
I'm not at all familiar with the Twisted reactor interface. My own design would be along the following lines:
- There's an abstract Reactor class and an abstract Async I/O object
class. To get a reactor to call you back, you must give it an I/O object, a callback, and maybe some more stuff. (I have gone back and like passing optional args for the callback, rather than requiring lambdas to create closures.) Note that the callback is *not* a designated method on the I/O object! In order to distinguish between edge-triggered and level-triggered, you just use a different reactor method. There could also be a reactor method to schedule a "bare" callback, either after some delay, or immediately (maybe with a given priority), although such functionality could also be implemented through magic I/O objects.
One reason to have a distinct method for running a bare callback is that you need to have some thread-safe entry point, but you otherwise don't really want locking on all the internal methods. Tornado's IOLoop.add_callback and Twisted's Reactor.callFromThread can be used to run code in the IOLoop's thread (which can then call the other IOLoop methods).
That's an important use case to support.
We also have distinct methods for running a callback after a timeout, although if you had a variant of add_handler that didn't require a subsequent call to remove_handler you could probably do timeouts using a magical IO object. (an additional subtlety for the time-based methods is how time is computed. I recently added support in tornado to optionally use time.monotonic instead of time.time)
- In systems supporting file descriptors, there's a reactor
implementation that knows how to use select/poll/etc., and there are concrete I/O object classes that wrap file descriptors. On Windows, those would only be socket file descriptors. On Unix, any file descriptor would do. To create such an I/O object you would use a platform-specific factory. There would be specialized factories to create e.g. listening sockets, connections, files, pipes, and so on.
Jython is another interesting case - it has a select() function that doesn't take integer file descriptors, just the opaque objects returned by socket.fileno().
Interesting.
While it's convenient to have higher-level constructors for various specialized types, I'd like to emphasize that having the low-level interface is important for interoperability. Tornado doesn't know whether the file descriptors are listening sockets, connected sockets, or pipes, so we'd just have to pass in a file descriptor with no other information.
Yeah, the IO object will still need to have a fileno() method.
- In systems like App Engine that don't support async I/O on file
descriptors at all, the constructors for creating I/O objects for disk files and connection sockets would comply with the interface but fake out almost everything (just like today, using httplib or httplib2 on App Engine works by adapting them to a "urlfetch" RPC request).
Why would you be allowed to make IO objects for sockets that don't work? I would expect that to just raise an exception. On app engine RPCs would be the only supported async I/O objects (and timers, if those are implemented as magic I/O objects), and they're not implemented in terms of sockets or files.
Here's my use case. Suppose in general one can use async I/O for disk files, and it is integrated with the standard (abstract) event loop. So someone writes a handy templating library that wants to play nice with async apps, so it uses the async I/O idiom to read e.g. the template source code. Support I want to use that library on App Engine. It would be a pain if I had to modify that template-reading code to not use the async API. But (given the right async API!) it would be pretty simple for the App Engine API to provide a mock implementation of the async file reading API that was synchronous under the hood. Yes, it would block while waiting for disk, but App Engine uses threads anyway so it wouldn't be a problem.
Another, current-day, use case is the httplib interface in the stdlib (a fairly fancy HTTP/1.1 client, although it has its flaws). That's based on sockets, which App Engine doesn't have; we have a "urlfetch" RPC that you give a URL (and more optional stuff) and returns a record containing the contents and headers. But again, many useful 3rd party libraries use httplib, and they won't work unless we somehow support httplib. So we have had to go out of our way to cover most uses of httplib. While the app believes it is opening the connection and sending the request, we are actually just buffering everything; and when the app starts reading from the connection, we make the urlfetch RPC and buffer the response, which we then feed back to the app as it believes it is reading from the socket. As long as the app doesn't try to get the socket's file descriptor and call select() it will work fine.
But some libraries *do* call select(), and here our emulation breaks down. It would be nicer if the standard way to do async stuff was higher level than select(), so that we could offer the emulation at a level that would integrate with the event loop -- that way, ideally when we have to send the urlfetch RPC we could actually return a Future (or whatever), and the task would correctly be suspended, just *thinking* it was waiting for the response on a socket, but actually waiting for the RPC.
Hopefully SSL provides another use case.