[Python-Dev] Shall I start adding iterators to Python 2.2?

Thu, 19 Apr 2001 23:29:45 -0500

I've got a fairly complete implementation of iterators along the lines
of Ping's PEP (slightly updated).  This is available for inspection
through CVS: just check out the CVS branch of python/dist/src named
"iter-branch" from SourceForge:

  cvs checkout -r iter-branch -d <directory> python/src/dist

(This branch was forked off the trunk around the time of version
2.1b1, so it's not up to date with Python 2.1, but it's good enough to
show off iterators.)

My question is: should I just merge this code onto the trunk (making
it part of 2.2), or should we review the design more before committing
to this implementation?

Brief overview of what I've got implemented:

- There is a new built-in operation, spelled as iter(obj) in Python
  and as PyObject_GetIter(obj) in C; it calls the tp_iter slot in
  obj's type.  This returns an iterator, which can be any object that
  implements the iterator protocol.  The iterator protocol defines one
  method, next(), which returns the next value or raises the new
  StopIteration exception.

- For backwards compatibility, if obj's type does not have a valid
  tp_iter slot, iter(obj) and PyObject_GetIter(obj) create a sequence
  iterator that iterates over a sequence.

- "for x in S: B" is implemented roughly as

  __iter = iter(S)
  while 1:
    try:
      x = __iter.next()
    except StopIteration:
      break
    B

  (except that the semantics of break when there's an else clause are
  not different from what this Python code would do).

- The test "key in dict" is implemented as "dict.has_key(key)".  (This
  was done by implementing the sq_contains slot.

- iter(dict) returns an iterator that iterates over the keys of dict
  without creating a list of keys first.  This means that "for key in
  dict" has the same effect as "for key in dict.keys()" as long as the
  loop body doesn't modify the dictionary (assignment to existing keys
  is okay).

- There's an operation to create an iterator from a function and a
  sentinel value.  This is spelled as iter(function, sentinel).  For
  example,

    for line in iter(sys.stdin.readline, ""):
      ...

  is an efficient loop over the lines of stdin.

- But even cooler is this, which is totally equivalent:

    for line in sys.stdin:
      ...

- Not yet implemented, but part of the plan, is to use iterators for
  all other implicit loops, like map/reduce/filter, min/max, and the
  "in" test for sequences that don't define sq_contains.

--Guido van Rossum (home page: http://www.python.org/~guido/)