Possible read()/readline() bug?

kdwyer kevin.p.dwyer at gmail.com
Wed Oct 22 17:00:35 EDT 2008


On 22 Oct, 19:54, Mike Kent <mrmak... at cox.net> wrote:
> Before I file a bug report against Python 2.5.2, I want to run this by
> the newsgroup to make sure I'm not being stupid.
>
> I have a text file of fixed-length records I want to read in random
> order.  That file is being changed in real-time by another process,
> and my process want to see the changes to the file.  What I'm seeing
> is that, once I've opened the file and read a record, all subsequent
> seeks to and reads of that same record will return the same data as
> the first read of the record, so long as I don't close and reopen the
> file.  This indicates some sort of buffering and caching is going on.
>
> Consider the following:
>
> $ echo "hi" >foo.txt  # Create my test file
> $ python2.5              # Run Python
> Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07)
> [GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> f = open('foo.txt')  # Open my test file
> >>> f.seek(0)                # Seek to the beginning of the file
> >>> f.readline()             # Read the line, I get the data I expected
> 'hi\n'
> >>> # At this point, in another shell I execute 'echo "bye" >foo.txt'.  'foo.txt' now has been changed
> >>> # on the disk, and now contains 'bye\n'.
> >>> f.seek(0)                # Seek to the beginning of the still-open file
> >>> f.readline()             # Read the line, I don't get 'bye\n', I get the original data, which is no longer there.
> 'hi\n'
> >>> f.close()                 # Now I close the file...
> >>> f = open('foo.txt') # ... and reopen it
> >>> f.seek(0)               # Seek to the beginning of the file
> >>> f.readline()            # Read the line, I get the expected 'bye\n'
> 'bye\n'
>
> It seems pretty clear to me that this is wrong.  If there is any
> caching going on, it should clearly be discarded if I do a seek.  Note
> that it's not just readline() that's returning me the wrong, cached
> data, as I've also tried this with read(), and I get the same
> results.  It's not acceptable that I have to close and reopen the file
> before every read when I'm doing random record access.
>
> So, is this a bug, or am I being stupid?

Hello Mike,

I'm guessing that this is not a bug.  I'm no expert, but I'd guess
that the open(file, mode) function simply loads the file into memory,
and that further operations (such as seek or read) are performed on
the in-memory data rather than the data on disk.  Thus changes to the
file are only observed after a fresh open operation.

This behaviour is probably enforced by the C library on the machine
that you are using.  If you want to be able to pick up data changes
like this then you're better off using a database package that has
support for concurrent access, locking and transactions.

Cheers,

Kev



More information about the Python-list mailing list