Iterators, generators and 2.2 (was RE: do...until wisdom needed...)

Tim Peters tim.one at home.com
Sat Apr 21 18:56:30 EDT 2001


[Aahz]
> Generators (called "iterators") are going to be 2.2.  They'll be
> more powerful that Icon's generators; it's not clear whether they'll
> be a full-fledged substitute for coroutines.

[Neelakantan Krishnaswami]
> {\mr_burns Excellent.}
>
> Is this the iterator idea that Ping posted about a couple of months
> back? What is the PEP number for this? I'm curious how the existing
> iteration protocol will interact with the new iterators.

This is getting confused.  Iterators != generators (sorry, Aahz! it's more
involved than that).

Aahz gave you the PEP number for iterators, and last night Guido checked an
initial implementation into the 2.2 CVS tree.  In Python terms, "for" setup
looks for an __iter__ method first, and if it doesn't find it but does find
__getitem__, builds a lightweight iterator around the __getitem__ method
instead.  So the "for" loop works only with iterators now, but there's an
adapter providing iterators by magic for old sequence objects that don't know
about iterators:

C:\Code\python\dist\src\PCbuild>python
Python 2.2a0 (#16, Apr 20 2001, 23:16:12) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> def f(s):
...     for i in s:
...         print i
...
>>> from dis import dis
>>> dis(f)
          0 SET_LINENO               1

          3 SET_LINENO               2
          6 SETUP_LOOP              25 (to 34)
          9 LOAD_FAST                0 (s)
         12 GET_ITER

    >>   13 SET_LINENO               2
         16 FOR_ITER                14
         19 STORE_FAST               1 (i)

         22 SET_LINENO               3
         25 LOAD_FAST                1 (i)
         28 PRINT_ITEM
         29 PRINT_NEWLINE
         30 JUMP_ABSOLUTE           13
         33 POP_BLOCK
    >>   34 LOAD_CONST               0 (None)
         37 RETURN_VALUE
>>>

The backward compatibility layer described above is hiding in the new
GET_ITER opcode.  Of course builtin lists (and so on) define the iterator
slot directly now, so GET_ITER simply returns their iterator directly.  Loops
are less complicated (internally) now, and run significantly faster.
User-defined types and classes no longer *need* to (ab)use __getitem__ to
implement iteration (which is of particular interest to Greg Wilson right
now, who is implementing a Set class and doesn't *want* to define __getitem__
because it's semantically senseless).

None of that should be controversial in the least.  More controversial is
that iteration over dict keys has been tentatively added (and note that this
is another thing made *possible* by breaking the old connection between
__getitem__ and iteration):

>>> dict = {"one": 1, "two": 2}
>>> for k in dict:
...     print k
...
one
two
>>>

This is significantly faster, and unboundedly more memory-efficient, than
doing "for k in dict.keys()".

The dict.__contains__ slot was also filled in, so that "k in dict" is
synonymous with "dict.has_key(k)", but again runs significantly faster:

>>> "one" in dict
1
>>> "three" in dict
0
>>>

File objects have also grown iterators, so that, e.g.,

    for line in sys.stdin:
        print line

now works.

Iterators can be explicitly materialized too, via the new iter() builltin
function, and invoked apart from the "for" protocol:

>>> i1 = iter(dict)
>>> i1
<dictionary-iterator object at 0079A250>
>>> dir(i1)
['next']
>>> print i1.next.__doc__
it.next() -- get the next value, or raise StopIteration
>>> i2 = iter(dict)
>>> i1.next()
'one'
>>> i2.next()
'one'
>>> i1.next()
'two'
>>> i2.next()
'two'
>>> i1.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
StopIteration
>>>

Note that this allows a simple memory-efficient implementation of parallel
sequence iteration too.  For example, this program:

class zipiter:
    def __init__(self, seq1, *moreseqs):
        seqs = [seq1]
        seqs.extend(list(moreseqs))
        self.seqs = seqs

    def __iter__(self):
        self.iters = [iter(seq) for seq in self.seqs]
        return self

    def next(self):
        return [i.next() for i in self.iters]

for i, j, k in zipiter([1, 2, 3], "abc", (5., 6., 7., 8.)):
    print i, j, k

prints

1 a 5.0
2 b 6.0
3 c 7.0


Now all that is just iteration in a thoroughly conventional sense.  There is
no support here for generators or coroutines or microthreads, except in the
sense that breaking the iteration==__getitem__ connection makes it easier to
think about *how* generators may be implemented, and having an explicit
iterator object "should" make it possible to go beyond Icon's notion of
generators (which can only be driven implicitly by control context).

Neil Schemenauer is currently thinking hard about that "in his spare time",
but there's no guarantee anything will come of it in 2.2.  Iterators are a
sure thing, though (not least because they're already implemented!).

not-only-implemented-but-feel-exactly-right-ly y'rs  - tim





More information about the Python-list mailing list