[Python-Dev] Single- vs. Multi-pass iterability

Guido van Rossum guido@python.org
Mon, 15 Jul 2002 10:39:51 -0400


> http://www.python.org/sf/580331
> 
> No, it's not a complete rewrite of file buffering.  This patch
> implements Just's idea of xreadlines caching in the file object.  It
> also makes a file into an iterator: __iter__ returns self and next
> calls the next method of the cached xreadlines object.

Hm.  What happens to the xreadlines object when you do a seek() on the
file?

With the old semantics, you could do f.seek(0) and get another
iterator (assuming it's a seekable file of course).  With the new
semantics, the cached iterator keeps getting in the way.

Maybe the xreadlines object could grow a flush() method that throws
away its buffer, and f.seek() could call that if there's a cached
xreadlines iterator?

> See my previous postings for why I think a file should be an iterator.

Haven't seen them but I would agree that this makes sense.

> With this patch any combination of multiple xreadlines and iterator
> protocol operations on a file object is safe. Using
> xreadlines/iterator followed by regular readline has the same
> buffering problem as before.

Agreed.

I just realized that the (existing) file_xreadlines() function has a
subtle bug.  It uses a local static variable to cache the function
xreadlines imported from the module xreadlines.  But if there are
multiple interpreters or Py_Finalize() is called and then
Py_Initialize() again, the cache is invalid.  Would you mind fixing
this?  I think the caching just isn't worth it -- just do the import
every time (it's fast enough if sys.modules['xreadlines'] already
exists).

--Guido van Rossum (home page: http://www.python.org/~guido/)