Peek inside iterator (is there a PEP about this?)

Aaron "Castironpi" Brady castironpi at gmail.com
Wed Oct 1 23:07:14 EDT 2008


On Oct 1, 3:14 pm, Terry Reedy <tjre... at udel.edu> wrote:
> Luis Zarrabeitia wrote:
> > Hi there.
>
> > For most use cases I think about, the iterator protocol is more than enough.
> > However, on a few cases, I've needed some ugly hacks.
>
> > Ex 1:
>
> > a = iter([1,2,3,4,5]) # assume you got the iterator from a function and
> > b = iter([1,2,3])     # these two are just examples.
>
> > then,
>
> > zip(a,b)
>
> > has a different side effect from
>
> > zip(b,a)
>
> > After the excecution, in the first case, iterator a contains just [5], on the
> > second, it contains [4,5]. I think the second one is correct (the 5 was never
> > used, after all). I tried to implement my 'own' zip, but there is no way to
> > know the length of the iterator (obviously), and there is also no way
> > to 'rewind' a value after calling 'next'.
>
> Interesting observation.  Iterators are intended for 'iterate through
> once and discard' usages.  To zip a long sequence with several short
> sequences, either use itertools.chain(short sequences) or put the short
> sequences as the first zip arg.
>
> > Ex 2:
>
> > Will this iterator yield any value? Like with most iterables, a construct
>
> > if iterator:
> >    # do something
>
> > would be a very convenient thing to have, instead of wrapping a 'next' call on
> > a try...except and consuming the first item.
>
> To test without consuming, wrap the iterator in a trivial-to-write
> one_ahead or peek class such as has been posted before.
>
> > Ex 3:
>
> > if any(iterator):
> >    # do something ... but the first true value was already consumed and
> >    # cannot be reused. "Any" cannot peek inside the iterator without
> >    # consuming the value.
>
> If you are going to do something with the true value, use a for loop and
> break.  If you just want to peek inside, use a sequence (list(iterator)).
>
> > Instead,
>
> > i1, i2 = tee(iterator)
> > if any(i1):
> >    # do something with i2
>
> This effectively makes two partial lists and tosses one.  That may or
> may not be a better idea.
>
> > Question/Proposal:
>
> > Has there been any PEP regarding the problem of 'peeking' inside an iterator?
>
> Iterators are not sequences and, in general, cannot be made to act like
> them.  The iterator protocol is a bare-minimum, least-common-denominator
> requirement for inter-operability.  You can, of course, add methods to
> iterators that you write for the cases where one-ahead or random access
> *is* possible.
>
> > Knowing if the iteration will end or not, and/or accessing the next value,
> > without consuming it? Is there any (simple, elegant) way around it?
>
> That much is trivial.  As suggested above, write a wrapper with the
> exact behavior you want.  A sample (untested)
>
> class one_ahead():
>    "Self.peek is the next item or undefined"
>    def __init__(self, iterator):
>      try:
>        self.peek = next(iterator)
>        self._it = iterator
>      except StopIteration:
>        pass
>    def __bool__(self):
>      return hasattr(self, 'peek')
>    def __next__(self): # 3.0, 2.6?
>      try:
>        next = self.peek
>        try:
>          self.peek = next(self._it)
>        except StopIteration:
>          del self.peek
>        return next
>      except AttrError:
>        raise StopIteration
>
> Terry Jan Reedy

Terry's is close.  '__nonzero__' instead of '__bool__', missing
'__iter__',  'next', 'self._it.next( )' in 2.5.

Then just define your own 'peekzip'.  Short:

def peekzip( *itrs ):
    while 1:
        if not all( itrs ):
            raise StopIteration
        yield tuple( [ itr.next( ) for itr in itrs ] )

In some cases, you could require 'one_ahead' instances in peekzip, or
create them yourself in new iterators.

Here is your output: The first part uses zip, the second uses peekzip.

[(1, 1), (2, 2), (3, 3)]
5
[(1, 1), (2, 2), (3, 3)]
4

4 is what you expect.

Here's the full code.

class one_ahead(object):
    "Self.peek is the next item or undefined"
    def __init__(self, iterator):
        try:
            self.peek = iterator.next( )
            self._it = iterator
        except StopIteration:
            pass
    def __nonzero__(self):
        return hasattr(self, 'peek')
    def __iter__(self):
        return self
    def next(self): # 3.0, 2.6?
        try:
            next = self.peek
            try:
                self.peek = self._it.next( )
            except StopIteration:
                del self.peek
            return next
        except AttributeError:
            raise StopIteration


a= one_ahead( iter( [1,2,3,4,5] ) )
b= one_ahead( iter( [1,2,3] ) )
print zip( a,b )
print a.next()

def peekzip( *itrs ):
    while 1:
        if not all( itrs ):
            raise StopIteration
        yield tuple( [ itr.next( ) for itr in itrs ] )

a= one_ahead( iter( [1,2,3,4,5] ) )
b= one_ahead( iter( [1,2,3] ) )
print list( peekzip( a,b ) )
print a.next()

There's one more option, which is to create your own 'push-backable'
class, which accepts a 'previous( item )' message.

(Unproduced)
>>> a= push_backing( iter( [1,2,3,4,5] ) )
>>> a.next( )
1
>>> a.next( )
2
>>> a.previous( 2 )
>>> a.next( )
2
>>> a.next( )
3




More information about the Python-list mailing list