[Python-Dev] Re: Reiterability
Guido van Rossum
guido at python.org
Sat Oct 18 18:05:38 EDT 2003
> > Reiteration comes for free if you hold on to that underlying object
> > rather than passing an iterator to them around.
>
> Yes, but you need to pass around a somewhat complicated thing --
> the iterator (to have the "current state in the iteration"), the callable
> that needs to be called to generate the iterator again (iter, or the
> generator, or the class whose instances are numerical series, ...)
> and the arguments for that callable (the sequence, the generator's
> arguments, the parameters with which to instantiate the class, ...).
>
> Nothing terrible, admittedly, and that's presumably how I'd architect
> things IF I ever met a use case for a "reiterable iterator":
>
> class ReiterableIterator(object):
> def __init__(self, thecallable, *itsargs, **itskwds):
> self.c, self.a, self.k = thecallable, itsargs, itskwds
> self.it = thecallable(*itsargs, **itskwds)
> def __iter__(self): return self
> def next(self): return self.it.next()
> def reiter(self): return self.__class__(self.c, *self.a, **self.k)
Why put support for a callable with arbitrary arguments in the
ReiterableIterator class? Why not say it's called without args, and
if the user has a need to use something with args, they can use one of
the many approaches to currying?
> typical toy example use:
>
> def printwice(n, reiter):
> for i, x in enumerate(reiter):
> if i>=n: break
> print x
> for i, x in enumerate(reiter.reiter()):
> if i>=n: break
> print x
>
> def evens():
> x = 0
> while 1:
> yield x
> x += 2
>
> printwice(5, ReiterableIterator(evens))
Are there any non-toy examples?
I'm asking because I can't remember ever having had this need myself.
> > [Alex again]
> >
> > > There ARE other features I'd REALLY have liked to get from iterators
> > > in some applications.
> > >
> > > A "snapshot" -- providing me two iterators, the original one and
> > > another, which will step independently over the same sequence of
> > > items -- would have been really handy at times. And a "step back"
> ...
> > > disturbed); but not knowing the abilities of the underlying iterator
> > > would mean these wrappers would often duplicate functionality
> > > needlessly.
> >
> > I don't see how it can be done without an explicit request for such a
> > wrapper in the calling code. If the underlying iterator is ephemeral
> > (is not reiterable) the snapshotter has to save a copy of every item,
> > and that would defeat the purpose of iterators if it was done
> > automatically. Or am I misunderstanding?
>
> No, you're not. But, if the need to snapshot (or reiterate, very
> different thing) was deemed important (and I have my doubts if
> either of them IS important enough -- I suspect snapshot perhaps,
> reiterable not, but I don't _know_), we COULD have those iterators
> which "know how to snapshot themselves" expose a .snapshot or
> __snapshot__ method. Then a function make_a_snapshottable(it) [the
> names are sucky, sorry, bear with me] would return it if that method
> was available, otherwise the big bad wrapper around it.
A better name would be clone(); copy() would work too, as long as it's
clear that it copies the iterator, not the underlying sequence or
series. (Subtle difference!)
Reiteration is a special case of cloning: simply stash away a clone
before you begin.
> Basically, by exposing suitable methods an iterator could "make its
> abilities know" to functions that may or may not need to wrap it in
> order to achieve certain semantics -- so the functions can build
> only those wrappers which are truly indispensable for the purpose.
> Roughly the usual "protocol" approach -- functions use an object's
> ability IF that object exposes methods providing that ability, and
> otherwise fake it on their own.
In this case I'm not sure if it is desirable to do this automatically.
If I request a clone of an iterator for a data stream coming from a
pipe or socket, it would have to start buffering everything. Sure, I
can come up with a buffering class that throws away buffered data that
none of the existing clones can reach, but I very much doubt if it's
worth it; a customized buffering scheme for the application at hand
would likely be more efficient than a generic solution.
> > I'm not sure what you are suggesting here. Are you proposing that
> > *some* iterators (those which can be snapshotted cheaply) sprout a new
> > snapshot() method?
>
> If snapshottability (eek!) is important enough, yes, though __snapshot__
> might perhaps be more traditional (but for iterators we do have the
> precedent of method next without __underscores__).
(Which I've admitted before was a mistake.)
A problem I have with making iterator cloning a standard option is
that this would pretty much require that all iterators for which
cloning can be implemented should implement clone(). That in turn
means that iterator implementors have to work harder (sometimes
cloning can be done cheaply, but it might require a different
refactoring of the iterator implementation).
Another issue is that it would make generators second-class citizens,
since they cannot be cloned. (It would seem to be possible to copy a
stack frame, but then the question begs whether to use shallow or deep
copying -- if a local variable in a generator references a list,
should the list be copied or not? And if it should be copied, should
it be a deep or shallow copy? There's no good answer without knowing
the intention of the programmer.)
> > > As I said I do have use cases for all of these. Simplest is the
> > > ability to push back the last item obtained by next, since a
> > > frequent
>
> Yeah, that's really easy to provide by a lightweight wrapper, which
> was my not-so-well-clarified intended point.
>
> > This definitely sounds like you'd want to create an explicit wrapper
>
> Absolutely.
>
> > Perhaps a snapshottable iterator could also have a backup() method
> > (which would decrement self.i in your first example) or a prev()
> > method (which would return self.sequence[self.i] and decrement
> > self.i).
>
> It seems to me that the ability to back up and that of snapshotting
> are somewhat independent.
Backing up suggests a strictly limited buffer; cloning suggests a
potentially arbitrarily large buffer. If backing up is what you
really need, it's easy to provide a wrapper for it (with a buffer
limit argument). Since the buffer is only limited, keeping a few
copies of items that aren't strictly necessary won't hurt; it doesn't
have the issue of wasting space with a full copy of an existing
sequence (or worse, of an easily regenerated series).
> > > A "snapshot" would be useful whenever more than one pass on a
> > > sequence _or part of it_ is needed (more useful than a "restart"
> > > because of the "part of it" provision). And a decent wrapper
> > > for it is a bear...
> >
> > Such wrappers for specific container types (or maybe just one for
> > sequences) could be in a standard library module. Is more needed?
>
> I think that if it's worth providing a wrapper it's also worth
> having those iterators that don't need the wrapper (because they
> already intrinsically have the needed ability) sprout the relevant
> method or special method; "factory functions" provided with the
> wrappers could then just return the already-satisfactory iterator,
> or a wrapper built around it, depending.
>
> Problem is, I'm NOT sure if "it's worth providing a wrapper" in each
> of these cases. snapshottingability (:-) is the one case where, if
> I had to decide myself right now, I'd say "go for it"... but that
> may be just because it's the one case for which I happened to
> stumble on some use cases in production (apart from "undoing", which
> isn't too bad to handle in other ways anyway).
I'd like to hear more about those cases, to see if they really need
cloning (:-) or can live with a fixed limited backup capability.
I think a standard backup wrapper would be a useful thing to have
(maybe in itertools?); since generator functions can't be cloned, I'm
going to push back on the need for cloning for now until I see a lot
more non-toy evidence.
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list