[Python-Dev] Single- vs. Multi-pass iterability

Oren Tirosh oren-py-d@hishome.net
Tue, 16 Jul 2002 08:25:03 +0300


On Mon, Jul 15, 2002 at 10:39:51AM -0400, Guido van Rossum wrote:
> > http://www.python.org/sf/580331
> > 
> > No, it's not a complete rewrite of file buffering.  This patch
> > implements Just's idea of xreadlines caching in the file object.  It
> > also makes a file into an iterator: __iter__ returns self and next
> > calls the next method of the cached xreadlines object.
> 
> Hm.  What happens to the xreadlines object when you do a seek() on the
> file?
> With the old semantics, you could do f.seek(0) and get another
> iterator (assuming it's a seekable file of course).  With the new
> semantics, the cached iterator keeps getting in the way.

On the new version of patch #580331 the cache is invalidated on a seek. 

> Maybe the xreadlines object could grow a flush() method that throws
> away its buffer, and f.seek() could call that if there's a cached
> xreadlines iterator?

The behavior of an xreadlines object is already undefined after a seek on 
the file.  This patch doesn't try to fix that.  The invalidation makes sure 
that the next iter() call will produce a fresh xreadlines, though.

Flushing would be too much work for this little hack. The right solution 
would be to fully integrate buffering into the file object and get rid of 
the dependency on the xreadlines module. The xreadlines method will then be 
equivalent to __iter__ (i.e. return self).  I assume that after this rewrite
the xreadlines module would be deprecated.

> > See my previous postings for why I think a file should be an iterator.
> 
> Haven't seen them but I would agree that this makes sense.

For some reason I got the impression that you disagreed.
 
> I just realized that the (existing) file_xreadlines() function has a
> subtle bug.  It uses a local static variable to cache the function
> xreadlines imported from the module xreadlines.  But if there are
> multiple interpreters or Py_Finalize() is called and then
> Py_Initialize() again, the cache is invalid.  Would you mind fixing
> this?  I think the caching just isn't worth it -- just do the import
> every time (it's fast enough if sys.modules['xreadlines'] already
> exists).

Done.

	Oren