[Python-Dev] Single- vs. Multi-pass iterability
Oren Tirosh
oren-py-d@hishome.net
Thu, 11 Jul 2002 02:15:28 -0400
On Wed, Jul 10, 2002 at 09:10:18PM -0400, Guido van Rossum wrote:
> > Then you couldn't do this:
> >
> > done = False
> > for line in f:
> > if not check(line):
> > break
> > process(line)
> > else:
> > done = True
> >
> > if not done:
> > for line in file:
> > another_process(line)
>
> That's already broken, see SF bug 524804.
Xreadlines is buffered and therefore leaves the file position of the file
in an unexpected state. If you use xreadlines explicitly you should expect
that. The fact that file.__iter__ returns an xreadlines object implicitly is
therefore a bit surprising.
What's the reason for using xreadlines as a file iterator? Was it
performance or was it just the easiest way to implement it using an existing
object?
"Files support the iterator protocol. Each iteration returns the same
result as file.readline()"
This is not correct. Files support what I call the iterable protocol. Objects
supporting the iterator protocol have a .next() method, files don't. While
it's true that each iteration has the same result as readline it doesn't
have the same side effects.
Proposal: make files really support the iterator protocol. __iter__ would
return self and next() would call readline and raise StopIteration if ''.
If anyone wants the xreadline performance improvement it should be explicit.
definitions:
iterable := hasattr(obj, '__iter__')
iterator := hasattr(obj, '__iter__') and hasattr(obj, 'next')
If object is iterable and not an iterator it would be reasonable to expect
that it is also re-iterable. I don't know if this should be a requirement
but I think it would be a good idea if all builtin objects should conform to
it anyway. Currently files are the only builtin that is iterable, not an
iterator and not re-iterable.
explicit-is-better-than-implicit-ly yours,
Oren