Possible read()/readline() bug?

M.-A. Lemburg mal at egenix.com
Thu Oct 23 17:02:42 CEST 2008

On 2008-10-22 23:00, kdwyer wrote:
> On 22 Oct, 19:54, Mike Kent <mrmak... at cox.net> wrote:
>> Before I file a bug report against Python 2.5.2, I want to run this by
>> the newsgroup to make sure I'm not being stupid.
>> I have a text file of fixed-length records I want to read in random
>> order.  That file is being changed in real-time by another process,
>> and my process want to see the changes to the file.  What I'm seeing
>> is that, once I've opened the file and read a record, all subsequent
>> seeks to and reads of that same record will return the same data as
>> the first read of the record, so long as I don't close and reopen the
>> file.  This indicates some sort of buffering and caching is going on.

The C lib uses a buffer for reading files and you are seeing the
affects of this.

Try using f = open('foo.txt', 'r', 0)


>> Consider the following:
>> $ echo "hi" >foo.txt  # Create my test file
>> $ python2.5              # Run Python
>> Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07)
>> [GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> f = open('foo.txt')  # Open my test file
>>>>> f.seek(0)                # Seek to the beginning of the file
>>>>> f.readline()             # Read the line, I get the data I expected
>> 'hi\n'
>>>>> # At this point, in another shell I execute 'echo "bye" >foo.txt'.  'foo.txt' now has been changed
>>>>> # on the disk, and now contains 'bye\n'.
>>>>> f.seek(0)                # Seek to the beginning of the still-open file
>>>>> f.readline()             # Read the line, I don't get 'bye\n', I get the original data, which is no longer there.
>> 'hi\n'
>>>>> f.close()                 # Now I close the file...
>>>>> f = open('foo.txt') # ... and reopen it
>>>>> f.seek(0)               # Seek to the beginning of the file
>>>>> f.readline()            # Read the line, I get the expected 'bye\n'
>> 'bye\n'
>> It seems pretty clear to me that this is wrong.  If there is any
>> caching going on, it should clearly be discarded if I do a seek.  Note
>> that it's not just readline() that's returning me the wrong, cached
>> data, as I've also tried this with read(), and I get the same
>> results.  It's not acceptable that I have to close and reopen the file
>> before every read when I'm doing random record access.
>> So, is this a bug, or am I being stupid?
> Hello Mike,
> I'm guessing that this is not a bug.  I'm no expert, but I'd guess
> that the open(file, mode) function simply loads the file into memory,
> and that further operations (such as seek or read) are performed on
> the in-memory data rather than the data on disk.  Thus changes to the
> file are only observed after a fresh open operation.
> This behaviour is probably enforced by the C library on the machine
> that you are using.  If you want to be able to pick up data changes
> like this then you're better off using a database package that has
> support for concurrent access, locking and transactions.
> Cheers,
> Kev
> --
> http://mail.python.org/mailman/listinfo/python-list

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Oct 23 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-list mailing list