[Tutor] iterators
spir
denis.spir at gmail.com
Sat Jan 18 12:13:02 CET 2014
On 01/18/2014 09:51 AM, Keith Winston wrote:
> I don't really get iterators. I saw an interesting example on
> Stackoverflow, something like
>
> with open('workfile', 'r') as f:
> for a, b, c in zip(f, f, f):
> ....
>
> And this iterated through a, b, c assigned to 3 consecutive lines of
> the file as it iterates through the file. I can sort of pretend that
> makes sense, but then I realize that other things that I thought were
> iterators aren't (lists and the range function)... I finally succeeded
> in mocking this up with a generator:
>
> gen = (i for i in range(20))
> for t1, t2, t3 in zip(gen, gen, gen):
> print(t1, t2, t3)
>
> So I'm a little more confident of this... though I guess there's some
> subtlety of how zip works there that's sort of interesting. Anyway,
> the real question is, where (why?) else do I encounter iterators,
> since my two favorite examples, aren't... and why aren't they, if I
> can iterate over them (can't I? Isn't that what I'm doing with "for
> item in list" or "for index in range(10)")?
An iterator is a kind of object that delivers items once at a time. It is to be
used with python's "for ... in ..." construct.
Concretely, for each pass of such 'for' cycle, python calls the iterator's
__next__ method. If the call returns an item, it is used in the pass; if the
call raises StopIteration, then the cycle stops. Here are two examples of
iterators (first ignore the __iter__ method, see below) and their usage:
======================================================
class Cubes:
def __init__ (self, max):
self.num = 0
self.max = max
def __next__ (self):
if self.num > self.max:
raise StopIteration()
item = self.num * self.num * self.num
self.num += 1
return item
def __iter__ (self):
return self
cubes9 = Cubes(9)
for cube in cubes9:
print(cube, end=' ')
print()
class Odds:
def __init__ (self, lst):
self.idx = 0
self.lst = lst
def __next__ (self):
# find next odd item, if any:
while self.idx < len(self.lst):
item = self.lst[self.idx]
self.idx += 1
if item % 2 == 1:
return item
# if none:
raise StopIteration()
def __iter__ (self):
return self
l = [0,1,2,3,4,5,6,7,8,9,10]
odds = Odds(l)
for odd in odds:
print(odd, end=' ')
print()
======================================================
As you can see, the relevant bit is the __next__ method. This and __iter__ are
the 2 slots forming the "iterator protocol", that iterators are required to
conform with.
There is a little subtlety: sequences like lists are not iterators. For users to
be able to iterate over sequences like lists, directly, *in code*:
for item in lst:
instead of:
for item in iter(lst):
python performs a little magic: if the supposed iterator passed (here lst) is
not an iterator in fact, then python looks for an __iter__ method in it, calls
it if found, and if this returns an iterator (respecting the iterator protocal),
then it uses that iterator instead. This is why actual iterators are required to
also have an __iter__ method, so that iterators and sequences can be used in
'for' loops indifferently. Since iterators are iterators, __iter__ just returns
self in their case.
Exercise: simulate python's iterator magic for lists. Eg make a 'List' type
(subtype of list) and implement its __iter__ method. This should create an
iterator object of type, say, ListIter which itself implements the iterator
protocal, and indeed correctly provides the list's items. (As you may guess, it
is a simpler version of my Odd type above.) (Dunno how to do that for sets or
dicts, since on the python side we have no access I know of to their actual
storage of items/pairs. In fact, this applies to lists as well, but indexing
provides indirect access.)
[Note, just to compare: in Lua, this little magic making builtin sequences
special does not exist. So, to iterate over all items or pairs of a Lua table,
one would write explicitely, resp.:
for key,val in pairs(t)
for item in ipairs(t)
where pairs & ipairs resp. create iterators for (key,val) pairs or indexed items
of a table (used as python lists or dicts). Functions pairs & ipairs are
builtin, but it's also trivial to make iterators (or generators) in Lua, since
it has 'free' objects we don't even need classes for that.]
Now, one may wonder why sequences don't implement the iterator protocal
themselves (actually, just __next__) and get rid of all that mess? Well, this
mess permits:
* a variety of traversal, with corresponding different iterators, for the *same*
(kind of) collections; for instance traversing a list backward, traversing trees
breadth-first or depth-first or only their leaves, or only nodes with elements...
* the same collection to be traversed in several loops at once (rarely needed,
but still); concretely nested loops (in principle also from multiple threads
concurrently); and this does not break (as long as the list itself remains
unchanged)
Now, you may imagine that, since there are builtin iterators for all of python's
"iteratable" types, and the corresponding magic is also builtin, and custom
types are constructed from builtin ones, then there is rarely a need for making
custom iterators and mastering the corresponding lower-level functioning. And
you'd certainly be right ;-)
Why do you want to explore that, now?
Denis
More information about the Tutor
mailing list