[Python-ideas] The async API of the future: Reactors

Fri Oct 12 20:13:23 CEST 2012

[This is the first spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell <ben at bendarnell.com> wrote:
> On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido at python.org> wrote:
>>> Re base reactor interface: drawing maximally from the lessons learned in
>>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>>> etc), asynchronous-looking name lookup, fd handling are the important parts.
>>
>> That actually sounds more concrete than I'd like a reactor interface
>> to be. In the App Engine world, there is a definite need for a
>> reactor, but it cannot talk about file descriptors at all -- all I/O
>> is defined in terms of RPC operations which have their own (several
>> layers of) async management but still need to be plugged in to user
>> code that might want to benefit from other reactor functionality such
>> as scheduling and placing a call at a certain moment in the future.
>
> So are you thinking of something like
> reactor.add_event_listener(event_type, event_params, func)?  One thing
> to keep in mind is that file descriptors are somewhat special (at
> least in a level-triggered event loop), because of the way the event
> will keep firing until the socket buffer is drained or the event is
> unregistered.  I'd be inclined to keep file descriptors in the
> interface even if they just raise an error on app engine, since
> they're fairly fundamental to the (unixy) event loop.  On the other
> hand, I don't have any experience with event loops outside the
> unix/network world so I don't know what other systems might need for
> their event loops.

Hmm... This is definitely an interesting issue. I'm tempted to believe
that it is *possible* to change every level-triggered setup into an
edge-triggered setup by using an explicit loop -- but I'm not saying
it is a good idea. In practice I think we need to support both equally
well, so that the *app* can decide which paradigm to use. E.g. if I
were to implement an HTTP server, I might use level-triggered for the
"accept" call on the listening socket, but edge-triggered for
everything else. OTOH someone else might prefer a buffered stream
abstraction that just keeps filling its read buffer (and draining its
write buffer) using level-triggered callbacks, at least up to a
certain buffer size -- we have to be robust here and make it
impossible for an evil client to fill up all our memory without our
approval!

I'm not at all familiar with the Twisted reactor interface. My own
design would be along the following lines:

- There's an abstract Reactor class and an abstract Async I/O object
class. To get a reactor to call you back, you must give it an I/O
object, a callback, and maybe some more stuff. (I have gone back and
like passing optional args for the callback, rather than requiring
lambdas to create closures.) Note that the callback is *not* a
designated method on the I/O object! In order to distinguish between
edge-triggered and level-triggered, you just use a different reactor
method. There could also be a reactor method to schedule a "bare"
callback, either after some delay, or immediately (maybe with a given
priority), although such functionality could also be implemented
through magic I/O objects.

- In systems supporting file descriptors, there's a reactor
implementation that knows how to use select/poll/etc., and there are
concrete I/O object classes that wrap file descriptors. On Windows,
those would only be socket file descriptors. On Unix, any file
descriptor would do. To create such an I/O object you would use a
platform-specific factory. There would be specialized factories to
create e.g. listening sockets, connections, files, pipes, and so on.

- In systems like App Engine that don't support async I/O on file
descriptors at all, the constructors for creating I/O objects for disk
files and connection sockets would comply with the interface but fake
out almost everything (just like today, using httplib or httplib2 on
App Engine works by adapting them to a "urlfetch" RPC request).

>>> call_every can be implemented in terms of call_later on a separate object,
>>> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
>>> that is apparently forgotten about is event loop integration. The prime way
>>> of having two event loops cooperate is *NOT* "run both in parallel", it's
>>> "have one call the other". Even though not all loops support this, I think
>>> it's important to get this as part of the interface (raise an exception for
>>> all I care if it doesn't work).
>>
>> This is definitely one of the things we ought to get right. My own
>> thoughts are slightly (perhaps only cosmetically) different again:
>> ideally each event loop would have a primitive operation to tell it to
>> run for a little while, and then some other code could tie several
>> event loops together.
>>
>> Possibly the primitive operation would be something like "block until
>> either you've got one event ready, or until a certain time (possibly
>> 0) has passed without any events, and then give us the events that are
>> ready and a lower bound for when you might have more work to do" -- or
>> maybe instead of returning the event(s) it could just call the
>> associated callback (it might have to if it is part of a GUI library
>> that has callbacks written in C/C++ for certain events like screen
>> refreshes).
>
> That doesn't work very well - while one loop is waiting for its
> timeout, nothing can happen on the other event loop.  You have to
> switch back and forth frequently to keep things responsive, which is
> inefficient.  I'd rather give each event loop its own thread; you can
> minimize the thread-synchronization concerns by picking one loop as
> "primary" and having all the others just pass callbacks over to it
> when their events fire.

That's a good point. I suppose on systems that support both networking
and GUI events, in my design these would use different I/O objects
(created using different platform-specific factories) and the shared
reactor API would sort things out based on the type of I/O object
passed in to it.

Note that many GUI events would be level-triggered, but sometimes
using the edge-triggered paradigm can work well too: e.g. I imagine
that writing code to draw a curve following the mouse as long as a
button is pressed might be conveniently written as a loop of the form

def on_mouse_press(x, y, buttons):
  <set up polygon starting current x, y>
  while True:
    x, y, buttons = yield <get mouse event>
    if not buttons:
      break
    <extend polygon to x, y>
  <finish polygon>

which itself is registered as a level-triggered handler for mouse
presses. (Dealing with multiple buttons is left as an exercise. :-)

-- 
--Guido van Rossum (python.org/~guido)