[Python-bugs-list] [ python-Bugs-524804 ] file iterators: unintuitive behavior
noreply@sourceforge.net
noreply@sourceforge.net
Sat, 02 Mar 2002 07:44:15 -0800
Bugs item #524804, was opened at 2002-03-02 16:44
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=524804&group_id=5470
Category: Python Library
Group: Python 2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Just van Rossum (jvr)
Assigned to: Nobody/Anonymous (nobody)
Summary: file iterators: unintuitive behavior
Initial Comment:
Given a file created with this snippet:
>>> f = open("tmp.txt", "w")
>>> for i in range(10000):
... f.write("%s\n" % i)
...
>>> f.close()
Iterating over a file multiple times has unexpected
behavior:
>>> f = open("tmp.txt")
>>> for line in f:
... print line.strip()
... break
...
0
>>> for line in f:
... print line.strip()
... break
...
1861
>>>
I expected the last output line to be 1 instead of
1861.
While I understand the cause (xreadlines being
used by the
file iterator, it reads a big chunk ahead, causing
the actual
filepos to be out of sync), this seems to be an
undocumented
gotcha. The docs say this:
[ ... ] Each iteration returns the same result as
file.readline(), and iteration ends when the
readline()
method returns an empty string.
That is true within one for loop, but not when you
break out
of the loop and start another one, which I think is a
valid
idiom.
Another example of breakage:
f = open(...)
for line in f:
if somecondition(line):
break
...
data = f.read() # read rest in one slurp
The fundamental problem IMO is that the file
iterator stacks
*another* state on top of an already stateful object.
In a
sense a file object is already an iterator. The two
states get
out of sync, causing confusing semantics, to say
the least.
The current behavior exposes an implementation
detail that
should be hidden.
I understand that speed is a major issue here, so
a solution
might not be simple.
Here's a report from an actual user:
http://groups.google.com/groups?hl=en&selm=
owen-
0B3ECB.10234615022002%40nntp2.u.washingto
n.edu
The rest of the thread suggests possible
solutions.
Here's what I *think* should happen (but: I'm
hardly aware
of both the fileobject and xreadline innards) is this:
xreadlines should be merged with the file object.
The buffer
that xreadlines implements should be *the* buffer
for the
file object, and *all* read methods should use *
that* buffer
and the according filepos.
Maybe files should grow a .next() method, so iter(f)
can return
f itself. .next() and .readline() are then 100%
equivalent.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=524804&group_id=5470