[Tutor] iterators

Peter Otten __peter__ at web.de
Sat Jan 18 10:50:21 CET 2014


Keith Winston wrote:

> I don't really get iterators. I saw an interesting example on
> Stackoverflow, something like
> 
> with open('workfile', 'r') as f:
>     for a, b, c in zip(f, f, f):
> ....
> 
> And this iterated through a, b, c assigned to 3 consecutive lines of
> the file as it iterates through the file. I can sort of pretend that
> makes sense, but then I realize that other things that I thought were
> iterators aren't (lists and the range function)... I finally succeeded
> in mocking this up with a generator:
> 
> gen = (i for i in range(20))
> for t1, t2, t3 in zip(gen, gen, gen):
>     print(t1, t2, t3)
> 
> So I'm a little more confident of this... though I guess there's some
> subtlety of how zip works there that's sort of interesting. Anyway,
> the real question is, where (why?) else do I encounter iterators,
> since my two favorite examples, aren't... and why aren't they, if I
> can iterate over them (can't I? Isn't that what I'm doing with "for
> item in list" or "for index in range(10)")?

You can get an iterator from a list or range object by calling iter():

>>> iter([1, 2, 3])
<list_iterator object at 0xeb0510>
>>> iter(range(10))
<range_iterator object at 0xeaccf0>

Every time you call iter on a sequence you get a new iterator, but when you 
call iter() on an iterator the iterator should return itself.

An iter() call is done implicitly by a for loop, but every time you call 
iter() on a sequence you get a new iterator object. So you can think of the 
for-loop

for item in range(10):
    print(item)

as syntactic sugar for

tmp = iter(range(10))
while True:
    try:
        item = next(tmp)
    except StopIteration:
        break
    print(item)

Back to your zip() example. Here is a possible implementation of zip():

>>> def myzip(*iterables):
...     iterators = [iter(it) for it in iterables]
...     while True:
...             t = tuple(next(it) for it in iterators)
...             if len(t) < len(iterators):
...                     break
...             yield t
... 

When you pass it a range object twice there will be two distinct iterators 
in the `iterators` list that are iterated over in parallel

>>> rfive = range(5)
>>> list(myzip(rfive, rfive))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]

but when you pass the same iterator twice because 
iter(some_iter) is some_iter 
holds (i. e. the iter() function is "idempotent")

next(iterators[0])

and

next(iterators[1])

operate on the same iterator and you get the behaviour seen at 
stackoverflow:

>>> ifive = iter(range(5))
>>> list(myzip(ifive, ifive))
[(0, 1), (2, 3)]

PS: There is an odd difference in the behaviour of list-comps and generator 
expressions. The latter swallow Stopiterations which is why the above 
myzip() needs the len() test:

>>> iterators = [iter(range(3))] * 10
>>> [next(it) for it in iterators]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
StopIteration
>>> iterators = [iter(range(3))] * 10
>>> tuple(next(it) for it in iterators)
(0, 1, 2)

With a list-comp myzip could be simplified:

>>> def myzip(*iterables):
...     iterators = [iter(it) for it in iterables]
...     while True:
...         t = [next(it) for it in iterators]
...         yield tuple(t)
... 
>>> list(myzip(*[iter(range(10))]*3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]




More information about the Tutor mailing list