f.readline(), for line in f
7stud
bbxx789_05ss at yahoo.com
Wed May 23 13:52:25 EDT 2007
Hi,
1) Does this make any sense:
"""
Thus, the loop:
for line in f:
iterates on each line of the file. Due to buffering issues,
interrupting such a loop prematurely(e.g. with break), or calling
f.next() instead of f.readline(), leaves the files position set to an
arbitrary value.
"""
The docs say:
"""
next( )
A file object is its own iterator, for example iter(f) returns f
(unless f is closed). When a file is used as an iterator, typically in
a for loop (for example, for line in f: print line), the next() method
is called repeatedly. This method returns the next input line, or
raises StopIteration when EOF is hit. In order to make a for loop the
most efficient way of looping over the lines of a file (a very common
operation), the next() method uses a hidden read-ahead buffer. As a
consequence of using a read-ahead buffer, combining next() with other
file methods (like readline()) does not work right. However, using
seek() to reposition the file to an absolute position will flush the
read-ahead buffer. New in version 2.3.
""
I experimented with this test code:
f = open("aaa.txt", "w")
for i in range(1000):
f.write("line " + str(i) + "\n")
f.close()
f = open("aaa.txt", "r")
for line in f:
print f.next()
print f.readline()
break
print f.next()
print f.readline()
f.close()
and the output was:
line 1
922
line 2
line 923
So, it looks like f.readline() is what messes things up--not f.next().
"for line in f" appears to be reading a chunk of the file into a
buffer, and then readline() gets the next line after the chunk.
2) Does f.readline() provide any buffering? It doesn't look like it
when I run this code and examine the output:
f = open("aaa.txt", "w")
for i in range(3000):
f.write("line " + str(i) + "\n")
f.close()
f = open("aaa.txt", "r")
for line in f:
print f.next()
print f.readline()
f.close()
The first few lines of the the output are:
line 1
922
line 3
line 923
line 5
line 924
(I assume the skipping from 1 to 3 to 5 is caused by the automatic
call to f.next() when the for loop begins in addition to the explicit
call to f.next() inside the loop.)
I interpret the output to mean that the chunk of the file put in the
buffer by "for line in f", ends in the middle of line 922, and "print
f.readline()" is printing the first line past the buffer. Scrolling
down to where "print f.next()" reaches line 922, I see this:
line 919
line 1381
line 921
line 1382
line 1384 <-----**
line 2407 <-----**
which means that when the buffer that was created by "for line in f"
is empty, the next chunk starting directly after the current position
of the readline() file position is put in the buffer. That indicates
that readline() provides no buffering. Then, the next call to
readline() jumps to a position after the chunk that was used to
replenish the buffer.
More information about the Python-list
mailing list