[Python-Dev] Re: Reiterability
Guido van Rossum
guido at python.org
Mon Oct 20 13:43:25 EDT 2003
> BTW, playing around with some of this it seems to me that the
> inability to just copy.copy (or copy.deepcopy) anything produced by
> iter(sequence) is more of a bother -- quite apart from clonability
> (a similar but separate concept), couldn't those iterators be
> copy'able anyway? I.e. just expose underlying sequence and index as
> their state for getting and setting?
I'm not sure why you say it's separate from cloning; it seems to me
that copy.copy(iter(range(10))) should return *exactly* what we'd want
the proposed clone operation to return.
> Otherwise to get copyable
> iterators I have to reimplement iter "by hand":
> class Iter(object):
> def __init__(self, seq):
> self.seq = seq
> self.idx = 0
> def __iter__(self): return self
> def next(self):
> try: result = self.seq[self.idx]
> except IndexError: raise StopIteration
> self.idx += 1
> return result
> and I don't understand the added value of requiring the user to
> code this no-added-value, slow-things-down boilerplate.
I see this as a plea to add __copy__ and __deepcopy__ methods to all
standard iterators for which it makes sense. (Or maybe only __copy__
-- I'm not sure what value __deepcopy__ would add.)
I find this a reasonable request for the iterators belonging to
stndard containers (list, tuple, dict). I guess that some of the
iterators in itertools might also support this easily. Perhaps this
would be the road to supporting iterator cloning?
> > > An iterator that knows it's coming from disk or pipe can provide
> > > that disk copy (or reuse the existing file) as part of its
> > > "optimized tee-ability".
> > At considerable cost.
> I'm not sure I see that cost, yet.
Mostly complexity of the code to implement it, and things like making
sure that the disk file is deleted (not an easy problem
> > lines of a file previously available. Also, on many systems,
> > every call to fseek() drops the stdio buffer, even if the seek
> > position is not actually changed by the call. It could be done,
> > but would require incredibly hairy code.
> The call to fseek probably SHOULD drop the buffer in a typical
> C implementation _on a R/W file_, because it's used as the way
> to signal the file that you're moving from reading to writing or VV
> (that's what the C standard says: you need a seek between an
> input op and an immediately successive output op or viceversa,
> even a seek to the current point, else, undefined behavior -- which
> reminds me, I don't know if the _Python_ wrapper maintains that
> "clever" requirement for ITS R/W files, but I think it does).
Yes it does: file_seek() calls drop_readahead().
> I can well believe that for simplicity a C-library implementor would
> then drop the buffer on a R/O file too, needlessly but
For any stdio implementation supporting fileno(), fseek() is also used
to synch up the seek positions maintained by stdio and by the
underlying OS or file descriptor implementation.
> The deuced "for line in flob:" is so deucedly optimized that trying
> to compete with it, even with something as apparently trivial as
> Lines1, is apparently a lost cause;-). OK, then I guess that an
> iterator by lines on a textfile can't easily be optimized for teeability
> by these "share the file object" strategies; rather, the best way to
> tee such a disk file would seem to be:
> def tee_diskfile(f):
> result = file(f.name, f.mode)
> return f, result
Right, except you might want to change the mode to a read-only mode
(without losing the 'b' or 'U' property).
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev