feof status (was: Re: [Python-Dev] Rehabilitating fgets)

Guido van Rossum guido@python.org
Mon, 08 Jan 2001 13:30:34 -0500


Eric, take a hint.  You're not going to get your eof() method no
matter what arguments you bring up.  But I'll explain it to you again
anyway... :-)

> Guido van Rossum <guido@python.org>:
> > Eric, before we go furhter, can you give an exact definition of
> > EOFness to me?

[Eric]
> A file is at EOF when attempts to read more data from it will fail
> returning no data.

I was afraid you would say this.  That's not a condition that's easy
to calculate without doing I/O, *and* that's not the condition that
you are interested in for your problem.  According to your definition,
f.eof() should be true in this example:

    f = open("/etc/passwd")
    f.seek(0, 2)                 # Seek to end of file
    print f.eof()                # What will this print???
    print `f.readline()`         # Will print ''

But getting the right result here requires a lot of knowledge about
how the file is implemented!  While you've explained how this can be
implemented on Unix, it can't be implemented with just the tools that
stdio gives us.  Going beyond stdio in order to implement a feature is
a grave decision.  After all, Python is portable to many
less-than-mainstream operating systems (VxWorks, OS/9, VMS...).  Now,
if this was just a speed hack (like xreadlines) I could accept having
some platform-dependent code, if at least there was a portable way to
do it that was just a bit slower.  But here you can't convince me that
this can be done in a portable way, and I don't want to force porters
to figure out how to do this for their platform before their port can
work.  I also don't want to make f.eof() a non-portable feature: *if*
it is provided, it's too important for that.

Note that stdio's feof() doesn't have this definition!  It is set when
the last *read* (or getc(), etc.) stumbled upon an EOF condition.
That's also of limited value; it's mostly defined so you can
distinguish between errors and EOF when you get a short read.  The
stdio feof() flag would be false in the above example.

> > What's wrong with just setting the parser loose on the input and
> > letting it deal with EOF?
> 
> Nothing wrong in theory, but it's a problem in practice.  I don't want
> to import the second parser unless it's actually needed, because it's much
> larger than the first one.

So be practical and let the first parser set a global flag that tells
you whether it's necessary to load the second one.

> >                                In your example, apparently a line
> > containing the word "history" signals that the rest of the file must
> > be parsed by the second parser.  What if "history" is the last line of
> > the file?  The eof() test can't tell you *that*!
> 
> Right.  That case never happens.  I mean it *really* never happens :-).
> 
> What we're talking about is a game system.  The first parser recognizes
> a spec language for describing games of a particular class (variants of
> Diplomacy, if that's meaningful to you).  The system keeps logfiles which
> consist of a a section in the game description language, optionally 
> followed by the token "history" and an order log.
> 
> The parser for the order log language is a *lot* larger than the one
> for the description language.  This is why I said I don't want the
> first parser to just call the second.  I want to test for EOF to
> know whether I have to import the second parser at all!
> 
> Here's the beginning of my problem: the first parser can't export a line
> buffer, because it doesn't *have* a line buffer.  It's a subclass of
> shlex and does single-character reads.
> 
> There are two ways I can cope with this.  One is to do a (nonzero)
> length read after the first parser exits; the other is to have the
> first parser set a state flag controlling whether the second parser
> loads.

Do the latter.  Nothing wrong with it that I can see.

> This is where it bites that I can't test for EOF with a read(0).

And can you tell me a system where you *can* test for EOF with a
read(0)?  I've never heard of such a thing.  The Unix read() system
call has the same properties as Python's f.read().  I'm pretty sure
that fread() with a zero count also doesn't give you the information
you're after.

> The
> second shlex parser only has token-level pushback!  If do a
> nonzero-length read and I get data, I'm screwed.  On the other hand
> (as I said before) setting a lexer state flag seems wrong, because
> EOFness is a property of the underlying stream rather than the parser.
> I'd be duplicating state that exists in the stdio stream structure
> anyway; it ought to be accessible.

Bullshit.  The EOFness that you're after (according to your own
definition) is not the same as the EOFness of the stdio stream.  The
EOFness in the stdio stream could help you, but Python resets it -- so
that making it available wouldn't be as easy as you claim.  Anyway,
you seem to have a sufficiently vague idea of what "EOFness" means
that I don't think providing access to whatever low-level EOFness
condition might exist would do you much good.

> > > Now, another and more general way to handle this would be to make an
> > > equivalent of the old FIONCLEX ioctl part of Python's standard set of 
> > > file object methods -- a way to ask "how many bytes are ready to be
> > > read in this stream?  
> > 
> > There's no portable way to do that.
> 
> Actually, fstat(2) is portable enough to support a very useful
> approximation of FIONCLEX.  I know, because I tried it.
> 
> Last night I coded up a "waiting" method for file objects that calls
> fstat(2) on the associated file descriptor.  For a plain file, it
> then subtracts the result of ftell() from the fstat size field and
> returns that -- for other files, it simply returns the size field.
> 
> I then tested this on plain files, FIFOs, and sockets under Linux. It
> turns out fstat(2) gives useful information in all three cases (a
> count of characters waiting in the buffer in the latter two).  I expected
> this; it should be true under all current Unixes.
> 
> fstat(2) does not give useful size-field results for Linux block
> devices.  I didn't test the character (terminal) devices.  (I
> documented my results in Python's Doc/lib/stat.tex, in a patch I have
> already submitted to SourceForge.)
> 
> I would be quite surprised if the plain-file case didn't work on Mac
> and Windows.  I would be a little surprised if the socket case failed,
> because all three probably inherited fstat(2) from the ancestral BSD
> TCP/IP stack.
> 
> Just having the plain-file case work would, IMHO, be justification
> enough for this method.  If it turns out to be portable across Mac and
> Windows sockets as well, *huge* win.  Could this be tested by someone
> with access to Windows and Mac systems?

I don't see the huge win.

--Guido van Rossum (home page: http://www.python.org/~guido/)