Snapshottable re-iterable iterators

David Abrahams dave at boost-consulting.com
Tue Jun 17 21:22:18 EDT 2003


Beni Cherniavsky <cben at techunix.technion.ac.il> writes:

> From time to time I wanted to iterate the same iterator more than once
> - but destructive iterators don't allow this.  So I wrote a class for
> wrapping destructive iterators.  It gives you an iterable, whose
> __iter__ goes over the values of the underlying iterator - but those
> are only requested once.  To make it fancy, I made sure that when you
> release the iterable and iterate with all your iterators on it past
> any point, no refernces remain to values older than it (I used a
> linked list, bulit from two-item lists).
>
> Then I thought that I do not want this multiple-iteration ability
> always from the start.  I wanted (and built) an iterator that could be
> "snapshotted"  at any moment - and the "snapshot" is a new iterator,
> returning the same values from this point in time.  It's similar in
> spirit to the `fork` system call.  Makes it easy to e.g. implement
> lookahead on a stream - you fork the stream iterator, iterate on the
> "child" stream and then can continue with the
>
> Looking for a good method name for this snapshotting, I thought of
> `__iter__` - that's where I ask for your comments.  I got an iterator
> that is destructive in the sense that calling ``.next()`` on it
> advances it irreversibly - but it is an iterable at the same time and
> calling ``.__iter__()`` on it creates a new iterator, with the same
> future.

This is a very nice solution to a problem I once wrestled with... if
you don't mind the memory cost, of course!

> So the question is: is it a good idea to violate the iterator protocol
> in this way - being an iterator but returning a "copy" of self from
> `__iter__` rather than self?  On one hand, it seems cute.  On the
> other, it makes it hard to avoid the forking when you don't want it.

When don't you want it (other than to save memory)?

After all, normally, once you execute:

      for x in iterable:
          ....

iterable is thereafter useless.  I think semantically, most code would
never be able to detect the difference.

> If I'll go with another method rather than `__iter__`, the best
> alternatives seem to be `copy` and `fork`.

I think you've done exactly the right thing.  Beautiful idea!

Incidentally, Andrew Koenig once invented a similar iterator over
linked lists for C++.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com




More information about the Python-list mailing list