[Python-ideas] Async API: some code to review

Mon Oct 29 17:07:31 CET 2012

Hello Guido,

Le Sun, 28 Oct 2012 16:52:02 -0700,
Guido van Rossum <guido at python.org> a
écrit :
> 
> The event list started out as a tuple of (fd, flag, callback, args),
> where flag is 'r' or 'w' (easily extensible); in practice neither the
> fd nor the flag are used, and one of the last things I did was to wrap
> callback and args into a simple object that allows cancelling the
> callback; the add_*() methods return this object. (This could probably
> use a little more abstraction.) Note that poll() doesn't call the
> callbacks -- that's up to the event loop.

I don't understand why the pollster takes callback objects if it never
calls them. Also the fact that it wraps them into DelayedCalls is more
mysterious to me. DelayedCalls represent one-time cancellable callbacks
with a given deadline, not callbacks which are called any number of
times on I/O events and that you can't cancel.

> scheduling.py:
> http://code.google.com/p/tulip/source/browse/scheduling.py
> 
> This is the scheduler for PEP-380 style coroutines. I started with a
> Scheduler class and operations along the lines of Greg Ewing's design,
> with a Scheduler instance as a global variable, but ended up ripping
> it out in favor of a Task object that represents a single stack of
> generators chained via yield-from. There is a Context object holding
> the event loop and the current task in thread-local storage, so that
> multiple threads can (and must) have independent event loops.

YMMV, but I tend to be wary of implicit thread-local storage. What if
someone runs a function or method depending on that thread-local
storage from inside a thread pool? Weird bugs ensue.

I think explicit context is much less error-prone. Even a single global
instance (like Twisted's reactor) would be better :-)

As for the rest of the scheduling module, I can't say much since I have
a hard time reading and understanding it.

> To invoke a primitive I/O operation, you call the current task's
> block() method and then immediately yield (similar to Greg Ewing's
> approach). There are helpers block_r() and block_w() that arrange for
> a task to block until a file descriptor is ready for reading/writing.
> Examples of their use are in sockets.py.

That's weird and kindof ugly IMHO. Why would you write:

	scheduling.block_w(self.sock.fileno())
        yield

instead of say:

        yield scheduling.block_w(self.sock.fileno())

?

Also, the fact that each call to SocketTransport.{recv,send} explicitly
registers then removes the fd on the event loop looks wasteful.

By the way, even when a fd is signalled ready, you must still be
prepared for recv() to return EAGAIN (see
http://bugs.python.org/issue9090).

> In the docstrings I use the prefix "COROUTINE:" to indicate public
> APIs that should be invoked using yield from.

Hmm, should they? Your approach looks a bit weird: you have functions
that should use yield, and others that should use "yield from"? That
sounds confusing to me.

I'd much rather either have all functions use "yield", or have all
functions use "yield from".

(also, I wouldn't be shocked if coroutines had to wear a special
decorator; it's a better marker than having the word COROUTINE in the
docstring, anyway :-))

> sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
> 
> This implements some internet primitives using the APIs in
> scheduling.py (including block_r() and block_w()). I call them
> transports but they are different from transports Twisted; they are
> closer to idealized sockets. SocketTransport wraps a plain socket,
> offering recv() and send() methods that must be invoked using yield
> from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
> stdlib ssl sockets have good async support!).

SslTransport.{recv,send} need the same kind of logic as do_handshake():
catch both SSLWantReadError and SSLWantWriteError, and call block_r /
block_w accordingly.

> Then there is a
> BufferedReader class that implements more traditional read() and
> readline() coroutines (i.e., to be invoked using yield from), the
> latter handy for line-oriented transports.

Well... It would be nice if BufferedReader could re-use the actual
io.BufferedReader and its fast readline(), read(), readinto()
implementations.

Regards

Antoine.