f.readline(), for line in f

Wed May 23 13:52:25 EDT 2007

Hi,

1) Does this make any sense:

"""
Thus, the loop:

     for line in f:

iterates on each line of the file.  Due to buffering issues,
interrupting such a loop prematurely(e.g. with break), or calling
f.next() instead of f.readline(), leaves the files position set to an
arbitrary value.
"""

The docs say:

"""
next(	)
A file object is its own iterator, for example iter(f) returns f
(unless f is closed). When a file is used as an iterator, typically in
a for loop (for example, for line in f: print line), the next() method
is called repeatedly. This method returns the next input line, or
raises StopIteration when EOF is hit. In order to make a for loop the
most efficient way of looping over the lines of a file (a very common
operation), the next() method uses a hidden read-ahead buffer. As a
consequence of using a read-ahead buffer, combining next() with other
file methods (like readline()) does not work right. However, using
seek() to reposition the file to an absolute position will flush the
read-ahead buffer. New in version 2.3.
""

I experimented with this test code:

f = open("aaa.txt", "w")
for i in range(1000):
f.write("line " + str(i) + "\n")
f.close()

f = open("aaa.txt", "r")
for line in f:
    print f.next()
    print f.readline()
    break

print f.next()
print f.readline()
f.close()

and the output was:

line 1

 922

line 2

line 923

So, it looks like f.readline() is what messes things up--not f.next().
"for line in f" appears to be reading a chunk of the file into a
buffer, and then readline() gets the next line after the chunk.

2) Does f.readline() provide any buffering? It doesn't look like it
when I run this code and examine the output:

f = open("aaa.txt", "w")
for i in range(3000):
    f.write("line " + str(i) + "\n")
f.close()

f = open("aaa.txt", "r")
for line in f:
    print f.next()
    print f.readline()

f.close()

The first few lines of the the output are:

line 1

 922

line 3

line 923

line 5

line 924

(I assume the skipping from 1 to 3 to 5 is caused by the automatic
call to f.next() when the for loop begins in addition to the explicit
call to f.next() inside the loop.)

I interpret the output to mean that the chunk of the file put in the
buffer by "for line in f", ends in the middle of line 922, and "print
f.readline()" is printing the first line past the buffer. Scrolling
down to where "print f.next()" reaches line 922, I see this:

line 919

line 1381

line 921

line 1382

line 1384  <-----**

line 2407  <-----**

which means that when the buffer that was created by "for line in f"
is empty, the next chunk starting directly after the current position
of the readline() file position is put in the buffer.  That indicates
that readline() provides no buffering.  Then, the next call to
readline() jumps to a position after the chunk that was used to
replenish the buffer.