[Python-Dev] Reiterability

Sat Oct 18 13:17:40 EDT 2003

[Guido]
> >Oh, no.  Not reiterability again.  How can you promise something to be
> >reiterable if you don't know whether the underlying iterator can be
> >reiterated?  Keeping a hidden buffer would be a bad idea.

[Alex]
> I agree it would be bad to have "black magic" performed by every
> iterator to fulfil a contract that may or may not be useful to
> clients and might be costly to fulfil.
> 
> IF "reiterability" is useful (and I'd need to see some use cases,
> because I don't particularly recall pining for it in Python) it
> should be exposed as a separate protocol that may or may not be
> offered by any given iterator type.  E.g., the presence of a special
> method __reiter__ could indicate that this iterator IS able to
> supply another iterator which retraces the same steps from the
> start; and perhaps iter(xxx, reiterable=True) could strive to
> provide a reiterable iterator for xxx, which might justify building
> one that keeps a hidden buffer as a last resort.  But first, I'd
> like use cases...

In cases where reiterabiliy can be implemented without much effort,
there is already an underlying object representing the sequence
(e.g. a collection object, or an object defining a numerical series).
Reiteration comes for free if you hold on to that underlying object
rather than passing an iterator to them around.

[Phillip]
> I think I phrased my question poorly.  What I should have said was:
> 
> "Should iterator expressions preserve the reiterability of the base
> expression?"

(An iterator expression being something like

  (f(x) for x in S)

right?)

> I don't want to make them guarantee reiterability, only to preserve
> it if it already exists.  Does that make more sense?
>
> In essence, this would be done by having an itercomp expression
> resolve to an object whose __iter__ method calls the underlying
> generator, returning a generator-iterator.  Thus, any iteration over
> the itercomp is equivalent to calling a no-arguments generator.  The
> result is reiterable if the base iterable is reiterable, otherwise
> not.

OK, I think I understand what you're after.  The code for an iterator
expression has to create a generator function behind the scenes, and
call it.  For example:

  A = (f(x) for x in S)

could be translated into:

  def gen(seq):
      for x in seq:
          yield f(x)
  A = gen(S)

(Note that S could be an arbitrary expression and should be evaluated
only once.  This translation does that correctly.)

This allows one to iterate once over A (a generator function doesn't
allow reiteration).  What you are asking looks like it could be done
like this (never mind the local names):

  def gen(seq):
      for x in seq:
	  yield f(x)
  class Helper:
      def __init__(seq):
          self.seq = seq
      def __iter__(self):
          return gen(self.seq)
  A = Helper(S)

Then every time you use iter(A) gen() will be called with the saved
value of S as argument.

> I suppose technically, this means the itercomp doesn't return an
> iterator, but an iterable, which I suppose could be confusing if you
> try to call its 'next()' method.  But then, it could have a next()
> method that raises an error saying "call 'iter()' on me first".

I don't mind that so much, but I don't think all the extra machinery
is worth it; the compiler generally can't tell if it is needed so it
has to produce the reiterable code every time.  If you *want* to
have an iterable instead of an iterator, it's usually easy enough do
(especially given knowledge about the type of S).

[Alex again]
> There ARE other features I'd REALLY have liked to get from iterators
> in some applications.
> 
> A "snapshot" -- providing me two iterators, the original one and
> another, which will step independently over the same sequence of
> items -- would have been really handy at times.  And a "step back"
> facility ("undo" of the last call to next) -- sometimes one level
> would suffice, sometimes not; often I could have provided the item
> to be "pushed back" so the iterator need not retain memory of it
> independently, but that wouldn't always be handy.  Now any of these
> can be built as a wrapper over an existing iterator, of course --
> just like 'reiterability' could (and you could in fact easily
> implement reiterability in terms of snapshotting, by just ensuring a
> snapshot is taken at the start and further snapshotted but never
> disturbed); but not knowing the abilities of the underlying iterator
> would mean these wrappers would often duplicate functionality
> needlessly.

I don't see how it can be done without an explicit request for such a
wrapper in the calling code.  If the underlying iterator is ephemeral
(is not reiterable) the snapshotter has to save a copy of every item,
and that would defeat the purpose of iterators if it was done
automatically.  Or am I misunderstanding?

> E.g.:
> 
> class snapshottable_sequence_iter(object):
>     def __init__(self, sequence, i=0):
>         self.sequence = sequence
>         self.i = i
>     def __iter__(self): return self
>     def next(self):
>         try: result = self.sequence[self.i]
>         except IndexError: raise StopIteration
>         self.i += 1
>         return result
>     def snapshot(self):
>         return self.__class__(self.sequence, self.i)
> 
> Here, snapshotting is quite cheap, requiring just a new counter and
> another reference to the same underlying sequence.  So would be
> restarting and stepping back, directly implemented.  But if we need
> to wrap a totally generic iterator to provide "snapshottability", we
> inevitably end up keeping a list (or the like) of items so far seen
> from one but not both 'independent' iterators obtained by a snapshot
> -- all potentially redundant storage, not to mention the possible
> coding trickiness in maintaining that FIFO queue.

I'm not sure what you are suggesting here.  Are you proposing that
*some* iterators (those which can be snapshotted cheaply) sprout a new
snapshot() method?

> As I said I do have use cases for all of these.  Simplest is the
> ability to push back the last item obtained by next, since a frequent
> patter is:
>     for item in iterator:
>         if isok(item): process(item)
>         else:
>             # need to push item back onto iterator, then
>             break
>     else:
>         # all items were OK, iterator exhausted, blah blah
> 
>     ...and later...
> 
>     for item in iterator:    # process some more items
> 
> Of course, as long as just a few levels of pushback are enough, THIS
> one is an easy and light-weight wrapper to write:
> 
> class pushback_wrapper(object):
>     def __init__(self, it):
>         self.it = it
>         self.pushed_back = []
>     def __iter__(self): return self
>     def next(self):
>         try: return self.pushed_back.pop()
>         except IndexError: return self.it.next()
>     def pushback(self, item):
>         self.pushed_back.append(item)

This definitely sounds like you'd want to create an explicit wrapper
for this; there is too much machinery here to make this a standard
feature.

Perhaps a snapshottable iterator could also have a backup() method
(which would decrement self.i in your first example) or a prev()
method (which would return self.sequence[self.i] and decrement
self.i).

> A "snapshot" would be useful whenever more than one pass on a
> sequence _or part of it_ is needed (more useful than a "restart"
> because of the "part of it" provision).  And a decent wrapper for it
> is a bear...

Such wrappers for specific container types (or maybe just one for
sequences) could be in a standard library module.  Is more needed?

--Guido van Rossum (home page: http://www.python.org/~guido/)