[Python-ideas] The async API of the future: Reactors

Sat Oct 13 06:52:19 CEST 2012

On Fri, Oct 12, 2012 at 11:13 AM, Guido van Rossum <guido at python.org> wrote:
> [This is the first spin-off thread from "asyncore: included batteries
> don't fit"]
>
> On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell <ben at bendarnell.com> wrote:
>> On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido at python.org> wrote:
>>>> Re base reactor interface: drawing maximally from the lessons learned in
>>>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>>>> etc), asynchronous-looking name lookup, fd handling are the important parts.
>>>
>>> That actually sounds more concrete than I'd like a reactor interface
>>> to be. In the App Engine world, there is a definite need for a
>>> reactor, but it cannot talk about file descriptors at all -- all I/O
>>> is defined in terms of RPC operations which have their own (several
>>> layers of) async management but still need to be plugged in to user
>>> code that might want to benefit from other reactor functionality such
>>> as scheduling and placing a call at a certain moment in the future.
>>
>> So are you thinking of something like
>> reactor.add_event_listener(event_type, event_params, func)?  One thing
>> to keep in mind is that file descriptors are somewhat special (at
>> least in a level-triggered event loop), because of the way the event
>> will keep firing until the socket buffer is drained or the event is
>> unregistered.  I'd be inclined to keep file descriptors in the
>> interface even if they just raise an error on app engine, since
>> they're fairly fundamental to the (unixy) event loop.  On the other
>> hand, I don't have any experience with event loops outside the
>> unix/network world so I don't know what other systems might need for
>> their event loops.
>
> Hmm... This is definitely an interesting issue. I'm tempted to believe
> that it is *possible* to change every level-triggered setup into an
> edge-triggered setup by using an explicit loop -- but I'm not saying
> it is a good idea. In practice I think we need to support both equally
> well, so that the *app* can decide which paradigm to use. E.g. if I
> were to implement an HTTP server, I might use level-triggered for the
> "accept" call on the listening socket, but edge-triggered for
> everything else. OTOH someone else might prefer a buffered stream
> abstraction that just keeps filling its read buffer (and draining its
> write buffer) using level-triggered callbacks, at least up to a
> certain buffer size -- we have to be robust here and make it
> impossible for an evil client to fill up all our memory without our
> approval!

First of all, to clear up the terminology, edge-triggered actually has
a specific meaning in this context that is separate from the question
of whether callbacks are used more than once. The edge- vs
level-triggered question is moot with one-shot callbacks, but when
you're reusing callbacks in edge-triggered mode you won't get a second
call until you've drained the socket buffer and then it becomes
readable again.  This turns out to be helpful for hybrid
event/threaded systems, since the network thread may go into the next
iteration of its loop while the worker thread is still consuming the
data from a previous event.

You can't always emulate edge-triggered behavior since it needs
knowledge of internal socket buffers (epoll has an edge-triggered mode
and I think kqueue does too, but you can't get edge-triggered behavior
if you're falling back to select()).  However, you can easily get
one-shot callbacks from an event loop with persistent callbacks just
by unregistering the callback once it has received an event.  This has
a performance cost, though - in tornado we try to avoid unnecessary
unregister/register pairs.

>
> I'm not at all familiar with the Twisted reactor interface. My own
> design would be along the following lines:
>
> - There's an abstract Reactor class and an abstract Async I/O object
> class. To get a reactor to call you back, you must give it an I/O
> object, a callback, and maybe some more stuff. (I have gone back and
> like passing optional args for the callback, rather than requiring
> lambdas to create closures.) Note that the callback is *not* a
> designated method on the I/O object! In order to distinguish between
> edge-triggered and level-triggered, you just use a different reactor
> method. There could also be a reactor method to schedule a "bare"
> callback, either after some delay, or immediately (maybe with a given
> priority), although such functionality could also be implemented
> through magic I/O objects.

One reason to have a distinct method for running a bare callback is
that you need to have some thread-safe entry point, but you otherwise
don't really want locking on all the internal methods.  Tornado's
IOLoop.add_callback and Twisted's Reactor.callFromThread can be used
to run code in the IOLoop's thread (which can then call the other
IOLoop methods).

We also have distinct methods for running a callback after a timeout,
although if you had a variant of add_handler that didn't require a
subsequent call to remove_handler you could probably do timeouts using
a magical IO object. (an additional subtlety for the time-based
methods is how time is computed.  I recently added support in tornado
to optionally use time.monotonic instead of time.time)

>
> - In systems supporting file descriptors, there's a reactor
> implementation that knows how to use select/poll/etc., and there are
> concrete I/O object classes that wrap file descriptors. On Windows,
> those would only be socket file descriptors. On Unix, any file
> descriptor would do. To create such an I/O object you would use a
> platform-specific factory. There would be specialized factories to
> create e.g. listening sockets, connections, files, pipes, and so on.
>

Jython is another interesting case - it has a select() function that
doesn't take integer file descriptors, just the opaque objects
returned by socket.fileno().

While it's convenient to have higher-level constructors for various
specialized types, I'd like to emphasize that having the low-level
interface is important for interoperability.  Tornado doesn't know
whether the file descriptors are listening sockets, connected sockets,
or pipes, so we'd just have to pass in a file descriptor with no other
information.

> - In systems like App Engine that don't support async I/O on file
> descriptors at all, the constructors for creating I/O objects for disk
> files and connection sockets would comply with the interface but fake
> out almost everything (just like today, using httplib or httplib2 on
> App Engine works by adapting them to a "urlfetch" RPC request).

Why would you be allowed to make IO objects for sockets that don't
work?  I would expect that to just raise an exception.  On app engine
RPCs would be the only supported async I/O objects (and timers, if
those are implemented as magic I/O objects), and they're not
implemented in terms of sockets or files.

-Ben