[Python-bugs-list] [ python-Bugs-521782 ] unreliable file.read() error handling

noreply@sourceforge.net noreply@sourceforge.net
Tue, 12 Nov 2002 07:58:44 -0800

Bugs item #521782, was opened at 2002-02-23 13:44
You can respond by visiting: 

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Marius Gedminas (mgedmin)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: unreliable file.read() error handling

Initial Comment:
fread(3) manual page states
       fread  and  fwrite return the number of items
       read or written (i.e., not the number of
characters).   If
       an error occurs, or the end-of-file is reached,
the return
       value is a short item count (or zero).

Python only checks ferror status when the return value
is zero (Objects/fileobject.c line 550 from
Python-2.1.2 sources).  I agree that it is a good idea
to delay exception throwing until after the user has
processed the partial chunk of data returned by fread,
but there are two problems with the current
implementation: loss of errno and occasional loss of data.

Both problems are illustrated with this scenario taken
from real life:

  suppose the file descriptor refers to a pipe, and we
set O_NONBLOCK mode with fcntl (the application was
reading from multiple pipes in a select() loop and
couldn't afford to block)
  fread(4096) returns an incomplete block and sets
errno to EAGAIN
  chunksize != 0 so we do not check ferror() and return
  the next time file.read() is called we reset errno
and do fread(4096) again.  It returns a full block
(i.e. bytesread == buffersize on line 559), so we
repeat the loop and call fread(0).  It returns 0, of
course.  Now we check ferror() and find it was set
(by a previous fread(4096) called maybe a century ago).
The errno information is already lost, so we throw
an IOError with errno=0.  And also lose that 4K chunk
of valuable user data.

Regarding solutions, I can see two alternatives:
- call clearerr(f->f_fp) just before fread(), where
Python currently sets errno = 0;  This makes sure that
we do not have stale ferror() flag and errno is valid,
but we might not notice some errors.  That doesn't
matter for EAGAIN, and for errors that occur reliably
if we repeat fread() on the same stream.  We might
still lose data if an exception is thrown on the second
or later loop iteration.
- always check for ferror() immediatelly after fread().
- regarding data loss, maybe it is possible to store
the errno somewhere inside the file object and delay
exception throwing if we have successfully read some
data (i.e. bytesread > 0).  The exception could be
thrown on the next call to file.read() before
performing anything else.


>Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-11-12 15:58

Logged In: YES 

Great catch Marius! Thank you very much!

I could identify the following problems from your report:

- If (requestsize == -1) and after a few iterations, fread()
  signals EWOULDBLOCK with (chunksize == 0), all read
  data is lost because an exception is raised.

- When (chunksize != 0) and EWOULDBLOCK is signaled,
  ferror() will be flagged. Then the next iteration will
fail if:
  1) read data has length 0 because EWOULDBLOCK is
  signaled again but errno is still set to 0 (I don't know why
  errno is not reset in that case); 2) read data has length 0
  because a block with the exact requested size is read.

The attached tests show these problems. The first problem
is a little bit harder to trigger. I had to preset SMALLCHUNK
to 4096 in fileobject.c, because my Linux refused to return
a chunk greater than that from fread(). The second problem
should be visible without further tweakings.

I have also attached a solution to the problem. This solution
works because:

- If (chunksize == 0), (errno == EWOULDBLOCK) and
  (readbytes > 0), there's no point in raising an exception, as
  we got some data for the user, just like when
  (bytesread < buffersize).

- When (chunksize != 0) and EWOULDBLOCK is signaled, it's
  not considered an error. OTOH, ferror() is still being set.
  clearerr() should be called in the stream before breaking the

- If (bytesread == buffersize) and (bytesrequest >= 0), it means
  that we already got all the requested data, and thus
there's no
  point in trying to read more data from the file and forcing
  (chunksize == 0) without ferror() being set.


You can respond by visiting: