[Python-bugs-list] [ python-Bugs-521782 ] unreliable file.read() error handling

Thu, 28 Nov 2002 03:00:03 -0800

Bugs item #521782, was opened at 2002-02-23 14:44
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=521782&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 8
Submitted By: Marius Gedminas (mgedmin)
>Assigned to: Gustavo Niemeyer (niemeyer)
Summary: unreliable file.read() error handling

Initial Comment:
fread(3) manual page states
       fread  and  fwrite return the number of items
successfully
       read or written (i.e., not the number of
characters).   If
       an error occurs, or the end-of-file is reached,
the return
       value is a short item count (or zero).

Python only checks ferror status when the return value
is zero (Objects/fileobject.c line 550 from
Python-2.1.2 sources).  I agree that it is a good idea
to delay exception throwing until after the user has
processed the partial chunk of data returned by fread,
but there are two problems with the current
implementation: loss of errno and occasional loss of data.

Both problems are illustrated with this scenario taken
from real life:

  suppose the file descriptor refers to a pipe, and we
set O_NONBLOCK mode with fcntl (the application was
reading from multiple pipes in a select() loop and
couldn't afford to block)
  fread(4096) returns an incomplete block and sets
errno to EAGAIN
  chunksize != 0 so we do not check ferror() and return
successfully
  the next time file.read() is called we reset errno
and do fread(4096) again.  It returns a full block
(i.e. bytesread == buffersize on line 559), so we
repeat the loop and call fread(0).  It returns 0, of
course.  Now we check ferror() and find it was set
(by a previous fread(4096) called maybe a century ago).
The errno information is already lost, so we throw
an IOError with errno=0.  And also lose that 4K chunk
of valuable user data.

Regarding solutions, I can see two alternatives:
- call clearerr(f->f_fp) just before fread(), where
Python currently sets errno = 0;  This makes sure that
we do not have stale ferror() flag and errno is valid,
but we might not notice some errors.  That doesn't
matter for EAGAIN, and for errors that occur reliably
if we repeat fread() on the same stream.  We might
still lose data if an exception is thrown on the second
or later loop iteration.
- always check for ferror() immediatelly after fread().
- regarding data loss, maybe it is possible to store
the errno somewhere inside the file object and delay
exception throwing if we have successfully read some
data (i.e. bytesread > 0).  The exception could be
thrown on the next call to file.read() before
performing anything else.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-11-28 12:00

Message:
Logged In: YES 
user_id=21627

I'm still missing proposed documentation changes.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-11-27 16:58

Message:
Logged In: YES 
user_id=7887

I'm attaching a new revised patch. I'll also increase the
priority, to reflect the serious problems this bug could cause.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-11-18 17:18

Message:
Logged In: YES 
user_id=7887

You're right about EAGAIN. The behavior I mentioned seems to
be valid only for old Unixes, as presented here:

http://www.gnu.org/manual/glibc/html_mono/libc.html

So indeed we should check for both. OTOH, the above URL also
mentions that EAGAIN can happen in blocking situations as well:

"On some systems, reading a large amount of data from a
character special file can also fail with EAGAIN if the
kernel cannot find enough physical memory to lock down the
user's pages."

Also, the statement in the documentation you mention isn't
true even without my patch. We currently check for
"readbytes < requestedbytes" and break the loop if it
happens. Indeed, that's one of the reasons that
created the second problem I've mentioned. Enforcing this
behavior, I could see that mentioned in the URL above:

"Any condition that could result in EAGAIN can instead
result in a successful read which returns fewer bytes than
requested. Calling read again immediately would result in
EAGAIN."

Thusly, applying this patch we wouldn't be introducing a new
behavior, but just enforcing a single behavior in every case.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-11-18 15:18

Message:
Logged In: YES 
user_id=21627

Depends on what system you have. On Linux-i386, we have

#define EWOULDBLOCK     EAGAIN

so it is necessarily the same thing. Can you report on what
system EAGAIN happens even if the file descriptor is
non-blocking?

As for documentation changes: The documentation currently says

   Read at most size bytes from the file (less if the read
hits EOF before obtaining size bytes)

I believe this statement is no longer true: with your
changes, it will return less data even without hitting EOF.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-11-18 14:50

Message:
Logged In: YES 
user_id=7887

Martin, EAGAIN is not the same as EWOULDBLOCK. While
EWOULDBLOCK will only be issued in a situation where the
filedescriptor is marked as non-blocking, EAGAIN can happen
at any time, including when requestsize == -1 (read
everything from the fd).

If we're going to handle EAGAIN, we must ensure that the
loop is not broken when it happens, instead of using the
same solution proposed for EWOULDBLOCK.

Do you think we should do that as well?

You're right about EWOULDBLOCK not being available. I'll
include the required #ifdef, and also the suggested comment.

Could you please clarify what you mean in this sentence: "In
any case, please update the documentation to describe the
special cases that you add.". IMO, the proposed patch is not
adding special cases. It's just fixing the algorithm to
behave the same way in all situations. In that case, what
doucmentation should be updated, in your opinion? Or perhaps
you're talking about the proposed e.data solution, which
would be used in other cases like EINTR similars?

Thank you!

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-11-14 09:29

Message:
Logged In: YES 
user_id=21627

I think the processing of the error condition is wrong, in a
number of ways:
- Posix allows for both EAGAIN and EWOULDBLOCK to be
signalled in this case.
- Neither EAGAIN nor EWOULDBLOCK are guaranteed to exist on
all systems.
- Please add a comment to the break; to indicate what the
purpose of this code is.

Also, I think there are a number of other cases where
bytesread might be non-zero, e.g. if EINTR was signalled.
It's not clear what the best solution would be, I propose
that the exception carries an attribute "data" to denote the
short data. Of course, in the EINTR case, the original
exception is lost and a KeyboardInterrupt is raised instead,
so putting the data into the exception might be pointless...

In any case, please update the documentation to describe the
special cases that you add.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-11-12 17:43

Message:
Logged In: YES 
user_id=7887

Martin, could you please review that?

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2002-11-12 16:58

Message:
Logged In: YES 
user_id=7887

Great catch Marius! Thank you very much!

I could identify the following problems from your report:

- If (requestsize == -1) and after a few iterations, fread()
  signals EWOULDBLOCK with (chunksize == 0), all read
  data is lost because an exception is raised.

- When (chunksize != 0) and EWOULDBLOCK is signaled,
  ferror() will be flagged. Then the next iteration will
fail if:
  1) read data has length 0 because EWOULDBLOCK is
  signaled again but errno is still set to 0 (I don't know why
  errno is not reset in that case); 2) read data has length 0
  because a block with the exact requested size is read.

The attached tests show these problems. The first problem
is a little bit harder to trigger. I had to preset SMALLCHUNK
to 4096 in fileobject.c, because my Linux refused to return
a chunk greater than that from fread(). The second problem
should be visible without further tweakings.

I have also attached a solution to the problem. This solution
works because:

- If (chunksize == 0), (errno == EWOULDBLOCK) and
  (readbytes > 0), there's no point in raising an exception, as
  we got some data for the user, just like when
  (bytesread < buffersize).

- When (chunksize != 0) and EWOULDBLOCK is signaled, it's
  not considered an error. OTOH, ferror() is still being set.
  clearerr() should be called in the stream before breaking the
  loop.

- If (bytesread == buffersize) and (bytesrequest >= 0), it means
  that we already got all the requested data, and thus
there's no
  point in trying to read more data from the file and forcing
  (chunksize == 0) without ferror() being set.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=521782&group_id=5470