On Mon, Oct 29, 2012 at 9:07 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Sun, 28 Oct 2012 16:52:02 -0700, Guido van Rossum <guido@python.org> a écrit :
The event list started out as a tuple of (fd, flag, callback, args), where flag is 'r' or 'w' (easily extensible); in practice neither the fd nor the flag are used, and one of the last things I did was to wrap callback and args into a simple object that allows cancelling the callback; the add_*() methods return this object. (This could probably use a little more abstraction.) Note that poll() doesn't call the callbacks -- that's up to the event loop.
I don't understand why the pollster takes callback objects if it never calls them. Also the fact that it wraps them into DelayedCalls is more mysterious to me. DelayedCalls represent one-time cancellable callbacks with a given deadline, not callbacks which are called any number of times on I/O events and that you can't cancel.
Yeah, this part definitely needs reworking. In the current design the pollster is a base class of the eventloop, and the latter *does* call them; but I want to refactor that anyway. I'll probably end up with a pollster that registers (what are to it) opaque tokens and returns just a list of tokens from poll(). (Unrelated: would it be useful if poll() was an iterator?)
scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py
This is the scheduler for PEP-380 style coroutines. I started with a Scheduler class and operations along the lines of Greg Ewing's design, with a Scheduler instance as a global variable, but ended up ripping it out in favor of a Task object that represents a single stack of generators chained via yield-from. There is a Context object holding the event loop and the current task in thread-local storage, so that multiple threads can (and must) have independent event loops.
YMMV, but I tend to be wary of implicit thread-local storage. What if someone runs a function or method depending on that thread-local storage from inside a thread pool? Weird bugs ensue.
Agreed, I had to figure out one of these in the implementation of call_in_thread() and it wasn't fun. I don't know what else to do -- I think it's probably best if I base my implementation on this for now so that I know it works correctly in such an environment. In the end there will probably be an API to get the current context and another to influence how that API gets it, so people can plug in their own schemes, from TLS to a simple global to something determined by an external library.
I think explicit context is much less error-prone. Even a single global instance (like Twisted's reactor) would be better :-)
I find that passing the context around everywhere makes for awkward APIs though.
As for the rest of the scheduling module, I can't say much since I have a hard time reading and understanding it.
That's a problem, I need to write this up properly so that everyone can understand it.
To invoke a primitive I/O operation, you call the current task's block() method and then immediately yield (similar to Greg Ewing's approach). There are helpers block_r() and block_w() that arrange for a task to block until a file descriptor is ready for reading/writing. Examples of their use are in sockets.py.
That's weird and kindof ugly IMHO. Why would you write:
scheduling.block_w(self.sock.fileno()) yield
instead of say:
yield scheduling.block_w(self.sock.fileno())
?
This has been debated at nauseam already (be glad you missed it); basically, there's not a whole lot of difference but if there are some APIs that require "yield X(args)" and others that require "yield from Y(args)" that's really confusing. The "bare yield only" makes it possible (though I didn't implement it here) to put some strict checks in the scheduler -- next() should never return anything except None. But there are other ways to do that too. Anyway, I probably will change the API so that e.g. sockets.py doesn't have to use this paradigm; I'll just wrap these low-level APIs in a proper "coroutine" and then sockets.py can just use "yield from block_r(fd)". (This is one reason why I like the "bare generators with yield from" approach that Greg Ewing and PEP 380 recommend: it's really cheap to wrap an API in an extra layer of yield-from. (See the yyftime.py benchmark I added to the tulip drectory.)
Also, the fact that each call to SocketTransport.{recv,send} explicitly registers then removes the fd on the event loop looks wasteful.
I am hoping to add some optimization for this -- I am actually planning a hackathon (or re-education session :-) with some Twisted folks where I hope they'll explain to me how they do this.
By the way, even when a fd is signalled ready, you must still be prepared for recv() to return EAGAIN (see http://bugs.python.org/issue9090).
Yeah, I should know, I ran into this for a Google project too (there was a kernel driver that was lying...). I had a cryptic remark in my post above referring to this.
In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from.
Hmm, should they? Your approach looks a bit weird: you have functions that should use yield, and others that should use "yield from"? That sounds confusing to me.
Yeah, see above.
I'd much rather either have all functions use "yield", or have all functions use "yield from".
Agreed, and I'm strongly in favor of "yield from". The block_r() + yield is considered an *internal* API.
(also, I wouldn't be shocked if coroutines had to wear a special decorator; it's a better marker than having the word COROUTINE in the docstring, anyway :-))
Agreed it would be useful as documentation, and maybe an API can use this to enforce proper coding style. It would have to be purely decoration though -- I don't want an extra layer of wrapping to occur each time you call a coroutine. (I.e. the decorator should just return "func".)
sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py
This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!).
SslTransport.{recv,send} need the same kind of logic as do_handshake(): catch both SSLWantReadError and SSLWantWriteError, and call block_r / block_w accordingly.
Oh... Thanks for the tip. I didn't find this in the ssl module docs.
Then there is a BufferedReader class that implements more traditional read() and readline() coroutines (i.e., to be invoked using yield from), the latter handy for line-oriented transports.
Well... It would be nice if BufferedReader could re-use the actual io.BufferedReader and its fast readline(), read(), readinto() implementations.
Agreed, I would love that too, but the problem is, *this* BufferedReader defines methods you have to invoke with yield from. Maybe we can come up with a solution for sharing code by modifying the _io module though; that would be great! (I've also been thinking of layering TextIOWrapper on top of these.) Thanks for the thorough review! -- --Guido van Rossum (python.org/~guido)