[Python-Dev] How io.IOBase.readline() should behave when used on non-blocking obj and no data available?

Thu Oct 16 16:50:02 CEST 2014

On Thu, Oct 16, 2014 at 4:34 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Thu, 16 Oct 2014 03:54:32 +0300
> Paul Sokolovsky <pmiscml at gmail.com> wrote:
> > Hello,
> >
> > io.RawIOBase.read() is well specified for behavior in case it
> > immediately gets a would-block condition: "If the object is in
> > non-blocking mode and no bytes are available, None is returned."
> > (https://docs.python.org/3/library/io.html#io.RawIOBase.read).
> >
> > However, nothing is said about such condition for io.IOBase.readline(),
> > which is mixin method in a base class, default implementation of which
> > thus would use io.RawIOBase.read(). Looking at 3.4.0 source, iobase.c:
> > iobase_readline() has:
> >
> >         b = _PyObject_CallMethodId(self, &PyId_read, "n", nreadahead);
> > [...]
> >         if (!PyBytes_Check(b)) {
> >             PyErr_Format(PyExc_IOError,
> >                          "read() should have returned a bytes object, "
> >                          "not '%.200s'", Py_TYPE(b)->tp_name);
> >
> > I.e. it's not even ready to receive legitimate return value of None
> > from read(). I didn't try to write a testcase though, so may be missing
> > something.
> >
> > So, how readline() should behave in this case, and can that be
> > specified in the Library Reference?
>
> Well, the problem is that it's not obvious how to implement such methods
> in a non-blocking context.
>
> Let's says some data is received but there isn't a complete line.
> Should readline() return just that data (an incomplete line)? That
> breaks the API's contract. Should readline() buffer the incomplete line
> and keep it for the next readline() call? But then the internal buffer
> becomes unbounded: perhaps there is no new line in the next 4GB of
> incoming data...
>
> And besides, raw I/O objects *shouldn't* have an internal buffer. That's
> the role of the buffered I/O layer.
>

Well, occasionally this occurs, and I think it's reasonable for readline()
to deal with it.

The argument about a 4 GB buffer is irrelevant -- this can happen with a
blocking underlying stream too.

I think that at the point where the readline() function says to itself "I
need more data" it should ask the underlying stream for data. If that
returns an empty string, meaning EOF, readline() is satisfied and return
whatever it has buffered (even if it's empty). If that returns some bytes
containing a newline, readline() is satisfied, returns the data up to that
point, and buffers the rest (if any). If the underlying stream returns
None, I think it makes sense for readline() to return None too --
attempting to read more will just turn into a busy-wait loop, and that's
the opposite of what should happen.

You may argue that the caller of readline() doesn't expect this. Sure. But
in the end, if the stream is unbuffered and the caller isn't prepared for
that, the caller will always get in trouble. Maybe it'll treat the None as
EOF. That's fine -- it would be the same if it was calling read() on the
underlying stream and it got None (the EOF signalling is the same in both
cases).

At least, by being prepared for the None from the underlying read() in the
readline() code, someone who knows what they are doing can use readline()
on a non-blocking stream -- when they receive None they will have to ask
their selector (or whatever they use) to wait for the underlying FD and
then they can try again.

(Alternatively, we could raise BlockingIOError, which is that the OS level
read() raises if there's no data immediately available on a non-blocking
FD; but it seems that streams have already gotten a convention of returning
None instead, so I think that should be propagated up the stack.)

Oh, BTW, I tested this a little bit. Currently readline() returns an empty
string (or empty bytes, depending on which level you use) when the stream
is nonblocking. I think returning None makes muck more sense.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20141016/664adcac/attachment.html>