
Work priorities don't allow me to spend another day replying in detail to the various emails on this topic, but I am still keeping up reading!
I have read Greg's response to my comparison between Future+yield-based coroutines and his yield-from-based, Future-free coroutines, and after having written a small prototype, I am now pretty much convinced that Greg's way is superior. This doesn't mean you can't use generators or yield-from for other purposes! It's just that *if* you are writing a coroutine for use with a certain schedule, you must use yield and yield-from in accordance to the scheduler's rules. However, code you call can still use yield and yield-from for iteration, and you can still use for-loops. In particular, if f is a coroutine, it can still write "for x in g(): ..." where g is a generator meant to be an iterator. However if g were instead a coroutine, f should call it using "yield from g()", and f and g should agree on the interface of their scheduler.
As to other topics, my current feeling is that we should try to separately develop requirements and prototype implementations of the I/O loop of the future, and to figure the loosest possible coupling between that and a coroutine scheduler (or any other type of scheduler). In particular, I think the I/O loop should not assume the event handlers are implemented using coroutines -- but if someone wants to write an awesome coroutine scheduler, they should be able to delegate all their I/O waiting needs to the I/O loop with very little trouble.
To me, this means that the I/O loop probably should use "plain" callback functions (i.e., not Futures, Deferreds or coroutines). We should also standardize the interface to the I/O loop so that 3rd parties can plug in their own I/O loop -- I don't see an end to the debate whether the best C library for event handling is libevent, libev or libuv.
While the focus of the I/O loop should be on single-threaded event handling, some standard interface should exist so that you can run certain code in a separate thread and wait for its completion -- I've found this handy when calling socket.getaddrinfo(), which may block. (Apparently async DNS lookups are really hard -- I read some complaints about libevent's DNS lookups, and IIUC many Firefox lockups are due to this.) But there may be other uses for this too.
An issue in the design of the I/O loop is the strain between a ready-based and completion-based design. The typical Unix design (whether based on select or any of the poll variants) is usually ready-based; but on Windows, the only way to get high performance is to base it on IOCP, which is completion-based (i.e. you start a specific async operation, like writing N bytes, and the I/O loop tells you when it is done). I would like people to be able to write fast event handling programs on Windows too, and ideally the only change would be the implementation of the I/O loop. But I don't know how tenable that is given the dramatically different style used by IOCP and the need to use native Windows API for all async I/O -- it sounds like we could only do this if the library providing the I/O loop implementation also wrapped all I/O operations, andthat may be a bit much.
Finally, there should also be some minimal interface so that multiple I/O loops can interact -- at least in the case where one I/O loop belongs to a GUI library. It seems this is a solved problem (as well solved as you can hope for) to Twisted, so we should just adopt their approach.