Snapshottable re-iterable iterators
David Abrahams
dave at boost-consulting.com
Tue Jun 17 21:22:18 EDT 2003
Beni Cherniavsky <cben at techunix.technion.ac.il> writes:
> From time to time I wanted to iterate the same iterator more than once
> - but destructive iterators don't allow this. So I wrote a class for
> wrapping destructive iterators. It gives you an iterable, whose
> __iter__ goes over the values of the underlying iterator - but those
> are only requested once. To make it fancy, I made sure that when you
> release the iterable and iterate with all your iterators on it past
> any point, no refernces remain to values older than it (I used a
> linked list, bulit from two-item lists).
>
> Then I thought that I do not want this multiple-iteration ability
> always from the start. I wanted (and built) an iterator that could be
> "snapshotted" at any moment - and the "snapshot" is a new iterator,
> returning the same values from this point in time. It's similar in
> spirit to the `fork` system call. Makes it easy to e.g. implement
> lookahead on a stream - you fork the stream iterator, iterate on the
> "child" stream and then can continue with the
>
> Looking for a good method name for this snapshotting, I thought of
> `__iter__` - that's where I ask for your comments. I got an iterator
> that is destructive in the sense that calling ``.next()`` on it
> advances it irreversibly - but it is an iterable at the same time and
> calling ``.__iter__()`` on it creates a new iterator, with the same
> future.
This is a very nice solution to a problem I once wrestled with... if
you don't mind the memory cost, of course!
> So the question is: is it a good idea to violate the iterator protocol
> in this way - being an iterator but returning a "copy" of self from
> `__iter__` rather than self? On one hand, it seems cute. On the
> other, it makes it hard to avoid the forking when you don't want it.
When don't you want it (other than to save memory)?
After all, normally, once you execute:
for x in iterable:
....
iterable is thereafter useless. I think semantically, most code would
never be able to detect the difference.
> If I'll go with another method rather than `__iter__`, the best
> alternatives seem to be `copy` and `fork`.
I think you've done exactly the right thing. Beautiful idea!
Incidentally, Andrew Koenig once invented a similar iterator over
linked lists for C++.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
More information about the Python-list
mailing list