[Python-Dev] generic async io

Thu Feb 15 21:15:32 CET 2007

dustin at v.igoro.us wrote:
> I think this discussion would be facilitated by teasing the first
> bullet-point from the latter two: the first deals with async IO, while
> the latter two deal with cooperative multitasking.
> 
> It's easy to write a single package that does both, but it's much harder
> to write *two* fairly generic packages with a clean API between them,
> given the varied platform support for async IO and the varied syntax and
> structures (continuations vs. microthreads, in my terminology) for
> multitasking.  Yet I think that division is exactly what's needed.
> 
> Since you asked (I'll assume the check for $0.02 is in the mail), I
> think a strictly-async-IO library would offer the following:
> 
>  - a sleep queue object to which callables can be added
>  - wrappers for all/most of the stdlib blocking IO operations which
>    add the operation to the list of outstanding operations and return
>    a sleep queue object
>    - some relatively easy method of extending that for new IO operations
>  - a poll() function (for multitasking libraries) and a serve_forever()
>    loop (for asyncore-like uses, where all the action is IO-driven)

A centralized approach of wrapping all blocking IO operation in stdlib
could only work in pure python applications. What about extensions that
integrate e.g. gtk2, gstreamer and other useful libraries that come
with their own low level IO. Python is not the right place to solve this
problem, and there are so many C-Libraries which tried it, e.g. gnu-pth
tries to implement pthreads on a single-threaded OS.

But none of these approaches is perfect. E.g. if you want to read 5 
bytes from a fd, you can use FIONREAD on a socket and get the number
of bytes available from the OS, so you can be sure to not block, but
FIONREAD on a normal file fd (e.g. on a NFS mount) will not tell you, 
how many  bytes the OS has prefetched, so you might block, even if you
are reading only 1 byte.

I think it's best to decide how to do the low level IO for each case
in the task. It knows what's it's doing and how to avoid blocking.

Therefore I propose to decouple the waiting for a condition/event
from the actual blocking operation. And to avoid the blocking, there
is no need to reinvent the wheel, the socket module already provides
ways to avoid it for network IO and a lot of C libraries exist to do
it in a portable way, but none is perfect.

And based on these events it's much easier to design a schedular
than to write one which also has to do the non blocking IO operations
in order to give the tasks the illusion of a blocking operation.

The BSD kevent is the most powerful event waiting mechanism with kernel 
support (as it unifies the waiting on different events on different 
resources like fd, process, timers, signals) but its API can be emulated 
to a sudden degree in the other event mechanism like notify on Linux
or Niels Provos' libevent.

The real showstopper for making the local event waiting easy are the 
missing coroutines or at least a form of non local goto like 
setjmp/longjump in C (that's what greenlets provides), remember that 
yield() only suspends the current
function, so every function on the stack must be prepared to handle
the yield, even if they are not interested in it (hiding this fact
with decorators does not make it better IMO)

Joachim