[Python-Dev] Re: Reiterability
Guido van Rossum
guido at python.org
Mon Oct 20 13:43:25 EDT 2003
> BTW, playing around with some of this it seems to me that the
> inability to just copy.copy (or copy.deepcopy) anything produced by
> iter(sequence) is more of a bother -- quite apart from clonability
> (a similar but separate concept), couldn't those iterators be
> copy'able anyway? I.e. just expose underlying sequence and index as
> their state for getting and setting?
I'm not sure why you say it's separate from cloning; it seems to me
that copy.copy(iter(range(10))) should return *exactly* what we'd want
the proposed clone operation to return.
> Otherwise to get copyable
> iterators I have to reimplement iter "by hand":
>
> class Iter(object):
> def __init__(self, seq):
> self.seq = seq
> self.idx = 0
> def __iter__(self): return self
> def next(self):
> try: result = self.seq[self.idx]
> except IndexError: raise StopIteration
> self.idx += 1
> return result
>
> and I don't understand the added value of requiring the user to
> code this no-added-value, slow-things-down boilerplate.
I see this as a plea to add __copy__ and __deepcopy__ methods to all
standard iterators for which it makes sense. (Or maybe only __copy__
-- I'm not sure what value __deepcopy__ would add.)
I find this a reasonable request for the iterators belonging to
stndard containers (list, tuple, dict). I guess that some of the
iterators in itertools might also support this easily. Perhaps this
would be the road to supporting iterator cloning?
> > > An iterator that knows it's coming from disk or pipe can provide
> > > that disk copy (or reuse the existing file) as part of its
> > > "optimized tee-ability".
> >
> > At considerable cost.
>
> I'm not sure I see that cost, yet.
Mostly complexity of the code to implement it, and things like making
sure that the disk file is deleted (not an easy problem
cross-platform!).
> > lines of a file previously available. Also, on many systems,
> > every call to fseek() drops the stdio buffer, even if the seek
> > position is not actually changed by the call. It could be done,
> > but would require incredibly hairy code.
>
> The call to fseek probably SHOULD drop the buffer in a typical
> C implementation _on a R/W file_, because it's used as the way
> to signal the file that you're moving from reading to writing or VV
> (that's what the C standard says: you need a seek between an
> input op and an immediately successive output op or viceversa,
> even a seek to the current point, else, undefined behavior -- which
> reminds me, I don't know if the _Python_ wrapper maintains that
> "clever" requirement for ITS R/W files, but I think it does).
Yes it does: file_seek() calls drop_readahead().
> I can well believe that for simplicity a C-library implementor would
> then drop the buffer on a R/O file too, needlessly but
> understandably.
For any stdio implementation supporting fileno(), fseek() is also used
to synch up the seek positions maintained by stdio and by the
underlying OS or file descriptor implementation.
> The deuced "for line in flob:" is so deucedly optimized that trying
> to compete with it, even with something as apparently trivial as
> Lines1, is apparently a lost cause;-). OK, then I guess that an
> iterator by lines on a textfile can't easily be optimized for teeability
> by these "share the file object" strategies; rather, the best way to
> tee such a disk file would seem to be:
> def tee_diskfile(f):
> result = file(f.name, f.mode)
> result.seek(f.tell())
> return f, result
Right, except you might want to change the mode to a read-only mode
(without losing the 'b' or 'U' property).
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list