Warning about "for line in file:"

Oren Tirosh oren-py-l at hishome.net
Mon Feb 18 05:34:28 EST 2002


On Fri, Feb 15, 2002 at 03:47:26PM -0500, Brian Kelley wrote:
> count = 0
> for line in file.xreadlines():
>     if count > 10: break
>     print line
>     count = count + 1
> 
> for line in file.xreadlines():
>     print line
> 
> So what is REALLY happening is that you are creating two seperate 
> iterators in the above examples.  Writing "for line in file" instead of 
> "for line in file.xreadlines()" simply hides and confuses this.

If you trace the problem to its true source you will see that file objects 
are not really containers that can be iterated - they are already iterators.
The container is the file on the disk.  A file iterator object is not a
real independent object, just a different protocol to access the file object 
using next() and StopIteration instead of readline() and an empty string.
The buffering problem that started this thread is just a side-effect of this 
case of mistaken identity: iterators pretending to be containers.

There is no need for a separate object to implement another protocol. A
single object can expose both the iterator and file protocols:

class file_(file):
    def __iter__(self):
        return self

    def xreadlines(self):
        return self

    def next():
        s = self.readline()
        if s:
            return self
        else:
            raise StopIteration

I believe it is fully backward compatible with existing sources that use 
file iteration. Implementing file iterators using xreadlines objects was just 
the quickest way to do it by reusing an existing piece code that was written 
long before the iterator protocol.

	Oren





More information about the Python-list mailing list