feof status (was: Re: [Python-Dev] Rehabilitating fgets)

Eric S. Raymond esr@thyrsus.com
Mon, 8 Jan 2001 13:01:37 -0500


Guido van Rossum <guido@python.org>:
> Eric, before we go furhter, can you give an exact definition of
> EOFness to me?

A file is at EOF when attempts to read more data from it will fail
returning no data.

> What's wrong with just setting the parser loose on the input and
> letting it deal with EOF?

Nothing wrong in theory, but it's a problem in practice.  I don't want
to import the second parser unless it's actually needed, because it's much
larger than the first one.

>                                In your example, apparently a line
> containing the word "history" signals that the rest of the file must
> be parsed by the second parser.  What if "history" is the last line of
> the file?  The eof() test can't tell you *that*!

Right.  That case never happens.  I mean it *really* never happens :-).

What we're talking about is a game system.  The first parser recognizes
a spec language for describing games of a particular class (variants of
Diplomacy, if that's meaningful to you).  The system keeps logfiles which
consist of a a section in the game description language, optionally 
followed by the token "history" and an order log.

The parser for the order log language is a *lot* larger than the one
for the description language.  This is why I said I don't want the
first parser to just call the second.  I want to test for EOF to
know whether I have to import the second parser at all!

Here's the beginning of my problem: the first parser can't export a line
buffer, because it doesn't *have* a line buffer.  It's a subclass of
shlex and does single-character reads.

There are two ways I can cope with this.  One is to do a (nonzero)
length read after the first parser exits; the other is to have the
first parser set a state flag controlling whether the second parser
loads.

This is where it bites that I can't test for EOF with a read(0). The
second shlex parser only has token-level pushback!  If do a
nonzero-length read and I get data, I'm screwed.  On the other hand
(as I said before) setting a lexer state flag seems wrong, because
EOFness is a property of the underlying stream rather than the parser.
I'd be duplicating state that exists in the stdio stream structure
anyway; it ought to be accessible.

> > Now, another and more general way to handle this would be to make an
> > equivalent of the old FIONCLEX ioctl part of Python's standard set of 
> > file object methods -- a way to ask "how many bytes are ready to be
> > read in this stream?  
> 
> There's no portable way to do that.

Actually, fstat(2) is portable enough to support a very useful
approximation of FIONCLEX.  I know, because I tried it.

Last night I coded up a "waiting" method for file objects that calls
fstat(2) on the associated file descriptor.  For a plain file, it
then subtracts the result of ftell() from the fstat size field and
returns that -- for other files, it simply returns the size field.

I then tested this on plain files, FIFOs, and sockets under Linux. It
turns out fstat(2) gives useful information in all three cases (a
count of characters waiting in the buffer in the latter two).  I expected
this; it should be true under all current Unixes.

fstat(2) does not give useful size-field results for Linux block
devices.  I didn't test the character (terminal) devices.  (I
documented my results in Python's Doc/lib/stat.tex, in a patch I have
already submitted to SourceForge.)

I would be quite surprised if the plain-file case didn't work on Mac
and Windows.  I would be a little surprised if the socket case failed,
because all three probably inherited fstat(2) from the ancestral BSD
TCP/IP stack.

Just having the plain-file case work would, IMHO, be justification
enough for this method.  If it turns out to be portable across Mac and
Windows sockets as well, *huge* win.  Could this be tested by someone
with access to Windows and Mac systems?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

An armed society is a polite society.  Manners are good when one 
may have to back up his acts with his life.
        -- Robert A. Heinlein, "Beyond This Horizon", 1942