Possible read()/readline() bug?

Mike Kent mrmakent at cox.net
Wed Oct 22 20:54:23 CEST 2008

Before I file a bug report against Python 2.5.2, I want to run this by
the newsgroup to make sure I'm not being stupid.

I have a text file of fixed-length records I want to read in random
order.  That file is being changed in real-time by another process,
and my process want to see the changes to the file.  What I'm seeing
is that, once I've opened the file and read a record, all subsequent
seeks to and reads of that same record will return the same data as
the first read of the record, so long as I don't close and reopen the
file.  This indicates some sort of buffering and caching is going on.

Consider the following:

$ echo "hi" >foo.txt  # Create my test file
$ python2.5              # Run Python
Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('foo.txt')  # Open my test file
>>> f.seek(0)                # Seek to the beginning of the file
>>> f.readline()             # Read the line, I get the data I expected
>>> # At this point, in another shell I execute 'echo "bye" >foo.txt'.  'foo.txt' now has been changed
>>> # on the disk, and now contains 'bye\n'.
>>> f.seek(0)                # Seek to the beginning of the still-open file
>>> f.readline()             # Read the line, I don't get 'bye\n', I get the original data, which is no longer there.
>>> f.close()                 # Now I close the file...
>>> f = open('foo.txt') # ... and reopen it
>>> f.seek(0)               # Seek to the beginning of the file
>>> f.readline()            # Read the line, I get the expected 'bye\n'

It seems pretty clear to me that this is wrong.  If there is any
caching going on, it should clearly be discarded if I do a seek.  Note
that it's not just readline() that's returning me the wrong, cached
data, as I've also tried this with read(), and I get the same
results.  It's not acceptable that I have to close and reopen the file
before every read when I'm doing random record access.

So, is this a bug, or am I being stupid?

More information about the Python-list mailing list