Re: [Python-ideas] The async API of the future: Reactors

13 Oct 2012


      On Fri, Oct 12, 2012 at 11:13 AM, Guido van Rossum  wrote:
...
[This is the first spin-off thread from "asyncore: included batteries
don't fit"]
On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell  wrote:
...
On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum  wrote:
...
...
Re base reactor interface: drawing maximally from the lessons learned in
twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
etc), asynchronous-looking name lookup, fd handling are the important parts.
That actually sounds more concrete than I'd like a reactor interface
to be. In the App Engine world, there is a definite need for a
reactor, but it cannot talk about file descriptors at all -- all I/O
is defined in terms of RPC operations which have their own (several
layers of) async management but still need to be plugged in to user
code that might want to benefit from other reactor functionality such
as scheduling and placing a call at a certain moment in the future.
So are you thinking of something like
reactor.add_event_listener(event_type, event_params, func)?  One thing
to keep in mind is that file descriptors are somewhat special (at
least in a level-triggered event loop), because of the way the event
will keep firing until the socket buffer is drained or the event is
unregistered.  I'd be inclined to keep file descriptors in the
interface even if they just raise an error on app engine, since
they're fairly fundamental to the (unixy) event loop.  On the other
hand, I don't have any experience with event loops outside the
unix/network world so I don't know what other systems might need for
their event loops.
Hmm... This is definitely an interesting issue. I'm tempted to believe
that it is *possible* to change every level-triggered setup into an
edge-triggered setup by using an explicit loop -- but I'm not saying
it is a good idea. In practice I think we need to support both equally
well, so that the *app* can decide which paradigm to use. E.g. if I
were to implement an HTTP server, I might use level-triggered for the
"accept" call on the listening socket, but edge-triggered for
everything else. OTOH someone else might prefer a buffered stream
abstraction that just keeps filling its read buffer (and draining its
write buffer) using level-triggered callbacks, at least up to a
certain buffer size -- we have to be robust here and make it
impossible for an evil client to fill up all our memory without our
approval!
First of all, to clear up the terminology, edge-triggered actually has
a specific meaning in this context that is separate from the question
of whether callbacks are used more than once. The edge- vs
level-triggered question is moot with one-shot callbacks, but when
you're reusing callbacks in edge-triggered mode you won't get a second
call until you've drained the socket buffer and then it becomes
readable again.  This turns out to be helpful for hybrid
event/threaded systems, since the network thread may go into the next
iteration of its loop while the worker thread is still consuming the
data from a previous event.

You can't always emulate edge-triggered behavior since it needs
knowledge of internal socket buffers (epoll has an edge-triggered mode
and I think kqueue does too, but you can't get edge-triggered behavior
if you're falling back to select()).  However, you can easily get
one-shot callbacks from an event loop with persistent callbacks just
by unregistering the callback once it has received an event.  This has
a performance cost, though - in tornado we try to avoid unnecessary
unregister/register pairs.
...
I'm not at all familiar with the Twisted reactor interface. My own
design would be along the following lines:
- There's an abstract Reactor class and an abstract Async I/O object
class. To get a reactor to call you back, you must give it an I/O
object, a callback, and maybe some more stuff. (I have gone back and
like passing optional args for the callback, rather than requiring
lambdas to create closures.) Note that the callback is *not* a
designated method on the I/O object! In order to distinguish between
edge-triggered and level-triggered, you just use a different reactor
method. There could also be a reactor method to schedule a "bare"
callback, either after some delay, or immediately (maybe with a given
priority), although such functionality could also be implemented
through magic I/O objects.
One reason to have a distinct method for running a bare callback is
that you need to have some thread-safe entry point, but you otherwise
don't really want locking on all the internal methods.  Tornado's
IOLoop.add_callback and Twisted's Reactor.callFromThread can be used
to run code in the IOLoop's thread (which can then call the other
IOLoop methods).

We also have distinct methods for running a callback after a timeout,
although if you had a variant of add_handler that didn't require a
subsequent call to remove_handler you could probably do timeouts using
a magical IO object. (an additional subtlety for the time-based
methods is how time is computed.  I recently added support in tornado
to optionally use time.monotonic instead of time.time)
...
- In systems supporting file descriptors, there's a reactor
implementation that knows how to use select/poll/etc., and there are
concrete I/O object classes that wrap file descriptors. On Windows,
those would only be socket file descriptors. On Unix, any file
descriptor would do. To create such an I/O object you would use a
platform-specific factory. There would be specialized factories to
create e.g. listening sockets, connections, files, pipes, and so on.
Jython is another interesting case - it has a select() function that
doesn't take integer file descriptors, just the opaque objects
returned by socket.fileno().

While it's convenient to have higher-level constructors for various
specialized types, I'd like to emphasize that having the low-level
interface is important for interoperability.  Tornado doesn't know
whether the file descriptors are listening sockets, connected sockets,
or pipes, so we'd just have to pass in a file descriptor with no other
information.
...
- In systems like App Engine that don't support async I/O on file
descriptors at all, the constructors for creating I/O objects for disk
files and connection sockets would comply with the interface but fake
out almost everything (just like today, using httplib or httplib2 on
App Engine works by adapting them to a "urlfetch" RPC request).
Why would you be allowed to make IO objects for sockets that don't
work?  I would expect that to just raise an exception.  On app engine
RPCs would be the only supported async I/O objects (and timers, if
those are implemented as magic I/O objects), and they're not
implemented in terms of sockets or files.

-Ben