[Python-Dev] Reiterability
Guido van Rossum
guido at python.org
Sat Oct 18 13:17:40 EDT 2003
[Guido]
> >Oh, no. Not reiterability again. How can you promise something to be
> >reiterable if you don't know whether the underlying iterator can be
> >reiterated? Keeping a hidden buffer would be a bad idea.
[Alex]
> I agree it would be bad to have "black magic" performed by every
> iterator to fulfil a contract that may or may not be useful to
> clients and might be costly to fulfil.
>
> IF "reiterability" is useful (and I'd need to see some use cases,
> because I don't particularly recall pining for it in Python) it
> should be exposed as a separate protocol that may or may not be
> offered by any given iterator type. E.g., the presence of a special
> method __reiter__ could indicate that this iterator IS able to
> supply another iterator which retraces the same steps from the
> start; and perhaps iter(xxx, reiterable=True) could strive to
> provide a reiterable iterator for xxx, which might justify building
> one that keeps a hidden buffer as a last resort. But first, I'd
> like use cases...
In cases where reiterabiliy can be implemented without much effort,
there is already an underlying object representing the sequence
(e.g. a collection object, or an object defining a numerical series).
Reiteration comes for free if you hold on to that underlying object
rather than passing an iterator to them around.
[Phillip]
> I think I phrased my question poorly. What I should have said was:
>
> "Should iterator expressions preserve the reiterability of the base
> expression?"
(An iterator expression being something like
(f(x) for x in S)
right?)
> I don't want to make them guarantee reiterability, only to preserve
> it if it already exists. Does that make more sense?
>
> In essence, this would be done by having an itercomp expression
> resolve to an object whose __iter__ method calls the underlying
> generator, returning a generator-iterator. Thus, any iteration over
> the itercomp is equivalent to calling a no-arguments generator. The
> result is reiterable if the base iterable is reiterable, otherwise
> not.
OK, I think I understand what you're after. The code for an iterator
expression has to create a generator function behind the scenes, and
call it. For example:
A = (f(x) for x in S)
could be translated into:
def gen(seq):
for x in seq:
yield f(x)
A = gen(S)
(Note that S could be an arbitrary expression and should be evaluated
only once. This translation does that correctly.)
This allows one to iterate once over A (a generator function doesn't
allow reiteration). What you are asking looks like it could be done
like this (never mind the local names):
def gen(seq):
for x in seq:
yield f(x)
class Helper:
def __init__(seq):
self.seq = seq
def __iter__(self):
return gen(self.seq)
A = Helper(S)
Then every time you use iter(A) gen() will be called with the saved
value of S as argument.
> I suppose technically, this means the itercomp doesn't return an
> iterator, but an iterable, which I suppose could be confusing if you
> try to call its 'next()' method. But then, it could have a next()
> method that raises an error saying "call 'iter()' on me first".
I don't mind that so much, but I don't think all the extra machinery
is worth it; the compiler generally can't tell if it is needed so it
has to produce the reiterable code every time. If you *want* to
have an iterable instead of an iterator, it's usually easy enough do
(especially given knowledge about the type of S).
[Alex again]
> There ARE other features I'd REALLY have liked to get from iterators
> in some applications.
>
> A "snapshot" -- providing me two iterators, the original one and
> another, which will step independently over the same sequence of
> items -- would have been really handy at times. And a "step back"
> facility ("undo" of the last call to next) -- sometimes one level
> would suffice, sometimes not; often I could have provided the item
> to be "pushed back" so the iterator need not retain memory of it
> independently, but that wouldn't always be handy. Now any of these
> can be built as a wrapper over an existing iterator, of course --
> just like 'reiterability' could (and you could in fact easily
> implement reiterability in terms of snapshotting, by just ensuring a
> snapshot is taken at the start and further snapshotted but never
> disturbed); but not knowing the abilities of the underlying iterator
> would mean these wrappers would often duplicate functionality
> needlessly.
I don't see how it can be done without an explicit request for such a
wrapper in the calling code. If the underlying iterator is ephemeral
(is not reiterable) the snapshotter has to save a copy of every item,
and that would defeat the purpose of iterators if it was done
automatically. Or am I misunderstanding?
> E.g.:
>
> class snapshottable_sequence_iter(object):
> def __init__(self, sequence, i=0):
> self.sequence = sequence
> self.i = i
> def __iter__(self): return self
> def next(self):
> try: result = self.sequence[self.i]
> except IndexError: raise StopIteration
> self.i += 1
> return result
> def snapshot(self):
> return self.__class__(self.sequence, self.i)
>
> Here, snapshotting is quite cheap, requiring just a new counter and
> another reference to the same underlying sequence. So would be
> restarting and stepping back, directly implemented. But if we need
> to wrap a totally generic iterator to provide "snapshottability", we
> inevitably end up keeping a list (or the like) of items so far seen
> from one but not both 'independent' iterators obtained by a snapshot
> -- all potentially redundant storage, not to mention the possible
> coding trickiness in maintaining that FIFO queue.
I'm not sure what you are suggesting here. Are you proposing that
*some* iterators (those which can be snapshotted cheaply) sprout a new
snapshot() method?
> As I said I do have use cases for all of these. Simplest is the
> ability to push back the last item obtained by next, since a frequent
> patter is:
> for item in iterator:
> if isok(item): process(item)
> else:
> # need to push item back onto iterator, then
> break
> else:
> # all items were OK, iterator exhausted, blah blah
>
> ...and later...
>
> for item in iterator: # process some more items
>
> Of course, as long as just a few levels of pushback are enough, THIS
> one is an easy and light-weight wrapper to write:
>
> class pushback_wrapper(object):
> def __init__(self, it):
> self.it = it
> self.pushed_back = []
> def __iter__(self): return self
> def next(self):
> try: return self.pushed_back.pop()
> except IndexError: return self.it.next()
> def pushback(self, item):
> self.pushed_back.append(item)
This definitely sounds like you'd want to create an explicit wrapper
for this; there is too much machinery here to make this a standard
feature.
Perhaps a snapshottable iterator could also have a backup() method
(which would decrement self.i in your first example) or a prev()
method (which would return self.sequence[self.i] and decrement
self.i).
> A "snapshot" would be useful whenever more than one pass on a
> sequence _or part of it_ is needed (more useful than a "restart"
> because of the "part of it" provision). And a decent wrapper for it
> is a bear...
Such wrappers for specific container types (or maybe just one for
sequences) could be in a standard library module. Is more needed?
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list