Possible read()/readline() bug?
kdwyer
kevin.p.dwyer at gmail.com
Wed Oct 22 17:00:35 EDT 2008
On 22 Oct, 19:54, Mike Kent <mrmak... at cox.net> wrote:
> Before I file a bug report against Python 2.5.2, I want to run this by
> the newsgroup to make sure I'm not being stupid.
>
> I have a text file of fixed-length records I want to read in random
> order. That file is being changed in real-time by another process,
> and my process want to see the changes to the file. What I'm seeing
> is that, once I've opened the file and read a record, all subsequent
> seeks to and reads of that same record will return the same data as
> the first read of the record, so long as I don't close and reopen the
> file. This indicates some sort of buffering and caching is going on.
>
> Consider the following:
>
> $ echo "hi" >foo.txt # Create my test file
> $ python2.5 # Run Python
> Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07)
> [GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> f = open('foo.txt') # Open my test file
> >>> f.seek(0) # Seek to the beginning of the file
> >>> f.readline() # Read the line, I get the data I expected
> 'hi\n'
> >>> # At this point, in another shell I execute 'echo "bye" >foo.txt'. 'foo.txt' now has been changed
> >>> # on the disk, and now contains 'bye\n'.
> >>> f.seek(0) # Seek to the beginning of the still-open file
> >>> f.readline() # Read the line, I don't get 'bye\n', I get the original data, which is no longer there.
> 'hi\n'
> >>> f.close() # Now I close the file...
> >>> f = open('foo.txt') # ... and reopen it
> >>> f.seek(0) # Seek to the beginning of the file
> >>> f.readline() # Read the line, I get the expected 'bye\n'
> 'bye\n'
>
> It seems pretty clear to me that this is wrong. If there is any
> caching going on, it should clearly be discarded if I do a seek. Note
> that it's not just readline() that's returning me the wrong, cached
> data, as I've also tried this with read(), and I get the same
> results. It's not acceptable that I have to close and reopen the file
> before every read when I'm doing random record access.
>
> So, is this a bug, or am I being stupid?
Hello Mike,
I'm guessing that this is not a bug. I'm no expert, but I'd guess
that the open(file, mode) function simply loads the file into memory,
and that further operations (such as seek or read) are performed on
the in-memory data rather than the data on disk. Thus changes to the
file are only observed after a fresh open operation.
This behaviour is probably enforced by the C library on the machine
that you are using. If you want to be able to pick up data changes
like this then you're better off using a database package that has
support for concurrent access, locking and transactions.
Cheers,
Kev
More information about the Python-list
mailing list