[Python-Dev] Single- vs. Multi-pass iterability

Oren Tirosh oren-py-d@hishome.net
Thu, 11 Jul 2002 02:15:28 -0400


On Wed, Jul 10, 2002 at 09:10:18PM -0400, Guido van Rossum wrote:
> > Then you couldn't do this:
> > 
> >     done = False
> >     for line in f:
> >         if not check(line):
> >             break
> >         process(line)
> >     else:
> >         done = True
> > 
> >     if not done:
> >         for line in file:
> >             another_process(line)
> 
> That's already broken, see SF bug 524804.

Xreadlines is buffered and therefore leaves the file position of the file 
in an unexpected state.  If you use xreadlines explicitly you should expect 
that. The fact that file.__iter__ returns an xreadlines object implicitly is 
therefore a bit surprising. 

What's the reason for using xreadlines as a file iterator?  Was it 
performance or was it just the easiest way to implement it using an existing 
object?

"Files support the iterator protocol. Each iteration returns the same
result as file.readline()"

This is not correct. Files support what I call the iterable protocol. Objects 
supporting the iterator protocol have a .next() method, files don't. While 
it's true that each iteration has the same result as readline it doesn't 
have the same side effects.

Proposal: make files really support the iterator protocol. __iter__ would
return self and next() would call readline and raise StopIteration if ''.
If anyone wants the xreadline performance improvement it should be explicit.

definitions: 

iterable := hasattr(obj, '__iter__') 
iterator := hasattr(obj, '__iter__') and hasattr(obj, 'next')

If object is iterable and not an iterator it would be reasonable to expect
that it is also re-iterable.  I don't know if this should be a requirement 
but I think it would be a good idea if all builtin objects should conform to 
it anyway.  Currently files are the only builtin that is iterable, not an 
iterator and not re-iterable. 

explicit-is-better-than-implicit-ly yours,

      Oren