[Python-ideas] The async API of the future: Reactors

12 Oct 2012

      [This is the first spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell  wrote:
...
On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum  wrote:
...
...
Re base reactor interface: drawing maximally from the lessons learned in
twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
etc), asynchronous-looking name lookup, fd handling are the important parts.
That actually sounds more concrete than I'd like a reactor interface
to be. In the App Engine world, there is a definite need for a
reactor, but it cannot talk about file descriptors at all -- all I/O
is defined in terms of RPC operations which have their own (several
layers of) async management but still need to be plugged in to user
code that might want to benefit from other reactor functionality such
as scheduling and placing a call at a certain moment in the future.
So are you thinking of something like
reactor.add_event_listener(event_type, event_params, func)?  One thing
to keep in mind is that file descriptors are somewhat special (at
least in a level-triggered event loop), because of the way the event
will keep firing until the socket buffer is drained or the event is
unregistered.  I'd be inclined to keep file descriptors in the
interface even if they just raise an error on app engine, since
they're fairly fundamental to the (unixy) event loop.  On the other
hand, I don't have any experience with event loops outside the
unix/network world so I don't know what other systems might need for
their event loops.
Hmm... This is definitely an interesting issue. I'm tempted to believe
that it is *possible* to change every level-triggered setup into an
edge-triggered setup by using an explicit loop -- but I'm not saying
it is a good idea. In practice I think we need to support both equally
well, so that the *app* can decide which paradigm to use. E.g. if I
were to implement an HTTP server, I might use level-triggered for the
"accept" call on the listening socket, but edge-triggered for
everything else. OTOH someone else might prefer a buffered stream
abstraction that just keeps filling its read buffer (and draining its
write buffer) using level-triggered callbacks, at least up to a
certain buffer size -- we have to be robust here and make it
impossible for an evil client to fill up all our memory without our
approval!

I'm not at all familiar with the Twisted reactor interface. My own
design would be along the following lines:

- There's an abstract Reactor class and an abstract Async I/O object
class. To get a reactor to call you back, you must give it an I/O
object, a callback, and maybe some more stuff. (I have gone back and
like passing optional args for the callback, rather than requiring
lambdas to create closures.) Note that the callback is *not* a
designated method on the I/O object! In order to distinguish between
edge-triggered and level-triggered, you just use a different reactor
method. There could also be a reactor method to schedule a "bare"
callback, either after some delay, or immediately (maybe with a given
priority), although such functionality could also be implemented
through magic I/O objects.

- In systems supporting file descriptors, there's a reactor
implementation that knows how to use select/poll/etc., and there are
concrete I/O object classes that wrap file descriptors. On Windows,
those would only be socket file descriptors. On Unix, any file
descriptor would do. To create such an I/O object you would use a
platform-specific factory. There would be specialized factories to
create e.g. listening sockets, connections, files, pipes, and so on.

- In systems like App Engine that don't support async I/O on file
descriptors at all, the constructors for creating I/O objects for disk
files and connection sockets would comply with the interface but fake
out almost everything (just like today, using httplib or httplib2 on
App Engine works by adapting them to a "urlfetch" RPC request).
...
...
...
call_every can be implemented in terms of call_later on a separate object,
so I think it should be (eg twisted.internet.task.LoopingCall). One thing
that is apparently forgotten about is event loop integration. The prime way
of having two event loops cooperate is *NOT* "run both in parallel", it's
"have one call the other". Even though not all loops support this, I think
it's important to get this as part of the interface (raise an exception for
all I care if it doesn't work).
This is definitely one of the things we ought to get right. My own
thoughts are slightly (perhaps only cosmetically) different again:
ideally each event loop would have a primitive operation to tell it to
run for a little while, and then some other code could tie several
event loops together.
Possibly the primitive operation would be something like "block until
either you've got one event ready, or until a certain time (possibly
0) has passed without any events, and then give us the events that are
ready and a lower bound for when you might have more work to do" -- or
maybe instead of returning the event(s) it could just call the
associated callback (it might have to if it is part of a GUI library
that has callbacks written in C/C++ for certain events like screen
refreshes).
That doesn't work very well - while one loop is waiting for its
timeout, nothing can happen on the other event loop.  You have to
switch back and forth frequently to keep things responsive, which is
inefficient.  I'd rather give each event loop its own thread; you can
minimize the thread-synchronization concerns by picking one loop as
"primary" and having all the others just pass callbacks over to it
when their events fire.
That's a good point. I suppose on systems that support both networking
and GUI events, in my design these would use different I/O objects
(created using different platform-specific factories) and the shared
reactor API would sort things out based on the type of I/O object
passed in to it.

Note that many GUI events would be level-triggered, but sometimes
using the edge-triggered paradigm can work well too: e.g. I imagine
that writing code to draw a curve following the mouse as long as a
button is pressed might be conveniently written as a loop of the form

def on_mouse_press(x, y, buttons):

  while True:
    x, y, buttons = yield <get mouse event>
    if not buttons:
      break

  <finish polygon>

which itself is registered as a level-triggered handler for mouse
presses. (Dealing with multiple buttons is left as an exercise. :-)

-- 
--Guido van Rossum (python.org/~guido)