[Python-Dev] Re: Reiterability

Guido van Rossum guido at python.org
Mon Oct 20 13:43:25 EDT 2003


> BTW, playing around with some of this it seems to me that the
> inability to just copy.copy (or copy.deepcopy) anything produced by
> iter(sequence) is more of a bother -- quite apart from clonability
> (a similar but separate concept), couldn't those iterators be
> copy'able anyway?  I.e. just expose underlying sequence and index as
> their state for getting and setting?

I'm not sure why you say it's separate from cloning; it seems to me
that copy.copy(iter(range(10))) should return *exactly* what we'd want
the proposed clone operation to return.

> Otherwise to get copyable
> iterators I have to reimplement iter "by hand":
> 
> class Iter(object):
>     def __init__(self, seq):
>         self.seq = seq
>         self.idx = 0
>     def __iter__(self): return self
>     def next(self):
>         try: result = self.seq[self.idx]
>         except IndexError: raise StopIteration
>         self.idx += 1
>         return result
> 
> and I don't understand the added value of requiring the user to
> code this no-added-value, slow-things-down boilerplate.

I see this as a plea to add __copy__ and __deepcopy__ methods to all
standard iterators for which it makes sense.  (Or maybe only __copy__
-- I'm not sure what value __deepcopy__ would add.)

I find this a reasonable request for the iterators belonging to
stndard containers (list, tuple, dict).  I guess that some of the
iterators in itertools might also support this easily.  Perhaps this
would be the road to supporting iterator cloning?

> > > An iterator that knows it's coming from disk or pipe can provide
> > > that disk copy (or reuse the existing file) as part of its
> > > "optimized tee-ability".
> >
> > At considerable cost.
> 
> I'm not sure I see that cost, yet.

Mostly complexity of the code to implement it, and things like making
sure that the disk file is deleted (not an easy problem
cross-platform!).

> > lines of a file previously available.  Also, on many systems,
> > every call to fseek() drops the stdio buffer, even if the seek
> > position is not actually changed by the call.  It could be done,
> > but would require incredibly hairy code.
> 
> The call to fseek probably SHOULD drop the buffer in a typical
> C implementation _on a R/W file_, because it's used as the way
> to signal the file that you're moving from reading to writing or VV
> (that's what the C standard says: you need a seek between an
> input op and an immediately successive output op or viceversa,
> even a seek to the current point, else, undefined behavior -- which
> reminds me, I don't know if the _Python_ wrapper maintains that
> "clever" requirement for ITS R/W files, but I think it does).

Yes it does: file_seek() calls drop_readahead().

> I can well believe that for simplicity a C-library implementor would
> then drop the buffer on a R/O file too, needlessly but
> understandably.

For any stdio implementation supporting fileno(), fseek() is also used
to synch up the seek positions maintained by stdio and by the
underlying OS or file descriptor implementation.

> The deuced "for line in flob:" is so deucedly optimized that trying
> to compete with it, even with something as apparently trivial as
> Lines1, is apparently a lost cause;-).  OK, then I guess that an
> iterator by lines on a textfile can't easily be optimized for teeability
> by these "share the file object" strategies; rather, the best way to
> tee such a disk file would seem to be:

> def tee_diskfile(f):
>     result = file(f.name, f.mode)
>     result.seek(f.tell())
>     return f, result

Right, except you might want to change the mode to a read-only mode
(without losing the 'b' or 'U' property).

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list