[Tutor] iterators

spir denis.spir at gmail.com
Sat Jan 18 12:13:02 CET 2014


On 01/18/2014 09:51 AM, Keith Winston wrote:
> I don't really get iterators. I saw an interesting example on
> Stackoverflow, something like
>
> with open('workfile', 'r') as f:
>      for a, b, c in zip(f, f, f):
> ....
>
> And this iterated through a, b, c assigned to 3 consecutive lines of
> the file as it iterates through the file. I can sort of pretend that
> makes sense, but then I realize that other things that I thought were
> iterators aren't (lists and the range function)... I finally succeeded
> in mocking this up with a generator:
>
> gen = (i for i in range(20))
> for t1, t2, t3 in zip(gen, gen, gen):
>      print(t1, t2, t3)
>
> So I'm a little more confident of this... though I guess there's some
> subtlety of how zip works there that's sort of interesting. Anyway,
> the real question is, where (why?) else do I encounter iterators,
> since my two favorite examples, aren't... and why aren't they, if I
> can iterate over them (can't I? Isn't that what I'm doing with "for
> item in list" or "for index in range(10)")?

An iterator is a kind of object that delivers items once at a time. It is to be 
used with python's "for ... in ..." construct.

Concretely, for each pass of such 'for' cycle, python calls the iterator's 
__next__ method. If the call returns an item, it is used in the pass; if the 
call raises StopIteration, then the cycle stops. Here are two examples of 
iterators (first ignore the __iter__ method, see below) and their usage:

======================================================
class Cubes:
     def __init__ (self, max):
         self.num = 0
         self.max = max

     def __next__ (self):
         if self.num > self.max:
             raise StopIteration()
         item = self.num * self.num * self.num
         self.num += 1
         return item

     def __iter__ (self):
         return self

cubes9 = Cubes(9)

for cube in cubes9:
     print(cube, end=' ')
print()

class Odds:
     def __init__ (self, lst):
         self.idx = 0
         self.lst = lst

     def __next__ (self):
         # find next odd item, if any:
         while self.idx < len(self.lst):
             item = self.lst[self.idx]
             self.idx += 1
             if item % 2 == 1:
                 return item
         # if none:
         raise StopIteration()

     def __iter__ (self):
         return self

l = [0,1,2,3,4,5,6,7,8,9,10]
odds = Odds(l)
for odd in odds:
     print(odd, end=' ')
print()
======================================================

As you can see, the relevant bit is the __next__ method. This and __iter__ are 
the 2 slots forming the "iterator protocol", that iterators are required to 
conform with.

There is a little subtlety: sequences like lists are not iterators. For users to 
be able to iterate over sequences like lists, directly, *in code*:
	for item in lst:
instead of:
	for item in iter(lst):
python performs a little magic: if the supposed iterator passed (here lst) is 
not an iterator in fact, then python looks for an __iter__ method in it, calls 
it if found, and if this returns an iterator (respecting the iterator protocal), 
then it uses that iterator instead. This is why actual iterators are required to 
also have an __iter__ method, so that iterators and sequences can be used in 
'for' loops indifferently. Since iterators are iterators, __iter__ just returns 
self in their case.

Exercise: simulate python's iterator magic for lists. Eg make a 'List' type 
(subtype of list) and implement its __iter__ method. This should create an 
iterator object of type, say, ListIter which itself implements the iterator 
protocal, and indeed correctly provides the list's items. (As you may guess, it 
is a simpler version of my Odd type above.) (Dunno how to do that for sets or 
dicts, since on the python side we have no access I know of to their actual 
storage of items/pairs. In fact, this applies to lists as well, but indexing 
provides indirect access.)

[Note, just to compare: in Lua, this little magic making builtin sequences 
special does not exist. So, to iterate over all items or pairs of a Lua table, 
one would write explicitely, resp.:
	for key,val in pairs(t)
	for item in ipairs(t)
where pairs & ipairs resp. create iterators for (key,val) pairs or indexed items 
of a table (used as python lists or dicts). Functions pairs & ipairs are 
builtin, but it's also trivial to make iterators (or generators) in Lua, since 
it has 'free' objects we don't even need classes for that.]

Now, one may wonder why sequences don't implement the iterator protocal 
themselves (actually, just __next__) and get rid of all that mess? Well, this 
mess permits:
* a variety of traversal, with corresponding different iterators, for the *same* 
(kind of) collections; for instance traversing a list backward, traversing trees 
breadth-first or depth-first or only their leaves, or only nodes with elements...
* the same collection to be traversed in several loops at once (rarely needed, 
but still); concretely nested loops (in principle also from multiple threads 
concurrently); and this does not break (as long as the list itself remains 
unchanged)

Now, you may imagine that, since there are builtin iterators for all of python's 
"iteratable" types, and the corresponding magic is also builtin, and custom 
types are constructed from builtin ones, then there is rarely a need for making 
custom iterators and mastering the corresponding lower-level functioning. And 
you'd certainly be right ;-)

Why do you want to explore that, now?

Denis



More information about the Tutor mailing list