[Python-bugs-list] [ python-Bugs-521782 ] unreliable file.read() error handling

Mon, 11 Nov 2002 21:19:28 -0800

Bugs item #521782, was opened at 2002-02-23 13:44
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=521782&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Marius Gedminas (mgedmin)
>Assigned to: Gustavo Niemeyer (niemeyer)
Summary: unreliable file.read() error handling

Initial Comment:
fread(3) manual page states
       fread  and  fwrite return the number of items
successfully
       read or written (i.e., not the number of
characters).   If
       an error occurs, or the end-of-file is reached,
the return
       value is a short item count (or zero).

Python only checks ferror status when the return value
is zero (Objects/fileobject.c line 550 from
Python-2.1.2 sources).  I agree that it is a good idea
to delay exception throwing until after the user has
processed the partial chunk of data returned by fread,
but there are two problems with the current
implementation: loss of errno and occasional loss of data.

Both problems are illustrated with this scenario taken
from real life:

  suppose the file descriptor refers to a pipe, and we
set O_NONBLOCK mode with fcntl (the application was
reading from multiple pipes in a select() loop and
couldn't afford to block)
  fread(4096) returns an incomplete block and sets
errno to EAGAIN
  chunksize != 0 so we do not check ferror() and return
successfully
  the next time file.read() is called we reset errno
and do fread(4096) again.  It returns a full block
(i.e. bytesread == buffersize on line 559), so we
repeat the loop and call fread(0).  It returns 0, of
course.  Now we check ferror() and find it was set
(by a previous fread(4096) called maybe a century ago).
The errno information is already lost, so we throw
an IOError with errno=0.  And also lose that 4K chunk
of valuable user data.

Regarding solutions, I can see two alternatives:
- call clearerr(f->f_fp) just before fread(), where
Python currently sets errno = 0;  This makes sure that
we do not have stale ferror() flag and errno is valid,
but we might not notice some errors.  That doesn't
matter for EAGAIN, and for errors that occur reliably
if we repeat fread() on the same stream.  We might
still lose data if an exception is thrown on the second
or later loop iteration.
- always check for ferror() immediatelly after fread().
- regarding data loss, maybe it is possible to store
the errno somewhere inside the file object and delay
exception throwing if we have successfully read some
data (i.e. bytesread > 0).  The exception could be
thrown on the next call to file.read() before
performing anything else.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=521782&group_id=5470