[Python-3000] Non-blocking I/O? (Draft PEP for New IO system)

Wed Mar 7 00:15:40 CET 2007

Reading this and all the other discussion on the proper semantics for
non-blocking I/O I think I may have overreached in trying to support
non-blocking I/O at all levels of the new I/O stack. There probably
aren't enough use cases for wanting to support readline() returning
None if no full line if input is available yet to warrant the
additional complexities -- and I haven't even looked very carefully at
incremental codecs, which introduce another (small) buffer.

I think maybe a useful simplification would be to support special
return values to capture EWOULDBLOCK (or equivalent) in the raw I/O
interface only. I think it serves a purpose here, since without such
support, code doing raw I/O would either require catching IOError all
the time and inspecting it for EWOULDBLOCK (or other platform specific
values!), or not using the raw I/O interface at all, requiring yet
another interface for raw non-blocking I/O.

The buffering layer could then raise IOError (or perhaps a special
subclass of it) if the raw I/O layer ever returned one of these; e.g.
if a buffered read needs to go to the raw layer to satisfy a request
and the raw read returns None, then the buffered read needs to raise
this error if no data has been taken out of the buffer yet; or it
should return a short read if some data was already consumed (since
it's hard to "unconsume" data, especially if the requested read length
is larger than the buffer size, or if there's an incremental encoder
involved). Thus, applications can assume that a short read means
either EOF or nonblocking I/O; most apps can safely ignore the latter
since it must be explicitly be turned on by the app.

For writing, if the buffering layer receives a short write, it should
try again; but if it receives an EWOULDBLOCK, it should likewise raise
the abovementioned error, since repeated attempts to write in this
case would just end up spinning the CPU without making progress. (We
should not raise an error if a single short write happens, since AFAIK
this is possible for TCP sockets even in blocking mode, witness the
addition of the sendall() method.)

This means that the buffering layer that sits directly on top of the
raw layer must still be prepared to deal with the special return
values from non-blocking I/O, but its API to the next layer up doesn't
need special return values, since it turns these into IOErrors, and
the next layer(s) up won't have to deal with it nor reflect it in
their API.

Would this satisfy the critics of the current design?

--Guido

On 3/4/07, Adam Olsen <rhamph at gmail.com> wrote:
> On 3/4/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > I'm having trouble seeing what the use case is for
> > the buffered non-blocking writes being discussed here.
> >
> > Doing asynchronous I/O usually *doesn't* involve
> > putting the file descriptor into non-blocking mode.
> > Instead you use select() or equivalent, and only
> > try to read or write when the file is reported as
> > being ready.
>
> I can't say which is more common, but non-blocking has a safer feel.
> Normal code would be select-driven in both, but if you screw up with
> non-blocking you get an error, whereas blocking you get a mysterious
> hang.
>
> accept() is the exception.  It's possible for a connection to
> disappear between the time select() returns and the time you call
> accept(), so you need to be non-blocking to avoid hanging.
>
> >
> > For this to work properly, the select() needs to
> > operate at the *bottom* of the I/O stack. Any
> > buffering layers sit above that, with requests for
> > data propagating up the stack as the file becomes
> > ready.
> >
> > In other words, the whole thing has to have the
> > control flow inverted and work in "pull" mode
> > rather than "push" mode. It's hard to see how this
> > could fit into the model as a minor variation on
> > how writes are done.
>
> Meaning it needs to be a distinct interface and explicitly designed as such.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)