[Python-Dev] How io.IOBase.readline() should behave when used on non-blocking obj and no data available?

Wed Oct 22 22:32:26 CEST 2014

Hello,

On Thu, 16 Oct 2014 13:34:06 +0200
Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Thu, 16 Oct 2014 03:54:32 +0300
> Paul Sokolovsky <pmiscml at gmail.com> wrote:
> > Hello,
> > 
> > io.RawIOBase.read() is well specified for behavior in case it
> > immediately gets a would-block condition: "If the object is in
> > non-blocking mode and no bytes are available, None is returned."
> > (https://docs.python.org/3/library/io.html#io.RawIOBase.read).
> > 
> > However, nothing is said about such condition for
> > io.IOBase.readline(), which is mixin method in a base class,
> > default implementation of which thus would use io.RawIOBase.read().
> > Looking at 3.4.0 source, iobase.c: iobase_readline() has:
> > 
> >         b = _PyObject_CallMethodId(self, &PyId_read, "n",
> > nreadahead); [...]
> >         if (!PyBytes_Check(b)) {
> >             PyErr_Format(PyExc_IOError,
> >                          "read() should have returned a bytes
> > object, " "not '%.200s'", Py_TYPE(b)->tp_name);
> > 
> > I.e. it's not even ready to receive legitimate return value of None
> > from read(). I didn't try to write a testcase though, so may be
> > missing something.
> > 
> > So, how readline() should behave in this case, and can that be
> > specified in the Library Reference?
> 
> Well, the problem is that it's not obvious how to implement such
> methods in a non-blocking context.
> 
> Let's says some data is received but there isn't a complete line.
> Should readline() return just that data (an incomplete line)? That
> breaks the API's contract. Should readline() buffer the incomplete
> line and keep it for the next readline() call? But then the internal
> buffer becomes unbounded: perhaps there is no new line in the next
> 4GB of incoming data...
> 
> And besides, raw I/O objects *shouldn't* have an internal buffer.
> That's the role of the buffered I/O layer.

Yes, sure, readline() is defined on io.IOBase which is underspecified
for buffered-ness, so should have behavior which can be implemented for
both buffered and unbuffered case.

You're right also in saying that readline on non-blocking stream can't
work always the same way as blocking version, and that it "breaks the
API's contract". But it should be possible to extend that contract
for non-blocking readline() in pretty natural way:

1) An invariant of readline() is that it doesn't modify stream data,
it just segments it. So, readline() + write() looped until EOF will
produce the same result as read(N) + write(). Non-blocking readline()
will still satisfy this.

2) Even with blocking readline(), it can return a string not ending
with end-of-line character(s). For blocking readline, this may happen
with just the last line of a stream, with non-blocking, it may happen
for any call. The point is that even with blocking readline(), the
caller should be ready to check that a line satisfies its "complete
line" criteria, for non-blocking case, it's just will be different set
of criteria and actions to satisfy them.

I guess, defining non-blocking readline() in such way is better then
let it be underspecified whether it's supported or not (and if yes,
then how), or prohibit it.

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com