[Python-checkins] CVS: python/nondist/peps pep-0234.txt,1.2,1.3

Guido van Rossum gvanrossum@users.sourceforge.net
Mon, 23 Apr 2001 11:31:48 -0700


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv20246

Modified Files:
	pep-0234.txt 
Log Message:
Almost completely rewritten, focusing on documenting the current state
of affairs, filling in some things still under discussion.

Ping, I hope this is okay with you.  If you want to revive "for
keys:values in dict" etc., you'll write a separate PEP, right?


Index: pep-0234.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0234.txt,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -r1.2 -r1.3
*** pep-0234.txt	2001/02/19 06:08:07	1.2
--- pep-0234.txt	2001/04/23 18:31:46	1.3
***************
*** 2,6 ****
  Title: Iterators
  Version: $Revision$
! Author: ping@lfw.org (Ka-Ping Yee)
  Status: Draft
  Type: Standards Track
--- 2,6 ----
  Title: Iterators
  Version: $Revision$
! Author: ping@lfw.org (Ka-Ping Yee), guido@python.org (Guido van Rossum)
  Status: Draft
  Type: Standards Track
***************
*** 14,241 ****
      provide to control the behaviour of 'for' loops.  Looping is
      customized by providing a method that produces an iterator object.
!     The iterator should be a callable object that returns the next
!     item in the sequence each time it is called, raising an exception
!     when no more items are available.
  
  
- Copyright
- 
-     This document is in the public domain.
- 
- 
- Sequence Iterators
- 
-     A new field named 'sq_iter' for requesting an iterator is added
-     to the PySequenceMethods table.  Upon an attempt to iterate over
-     an object with a loop such as
- 
-         for item in sequence:
-             ...body...
- 
-     the interpreter looks for the 'sq_iter' of the 'sequence' object.
-     If the method exists, it is called to get an iterator; it should
-     return a callable object.  If the method does not exist, the
-     interpreter produces a built-in iterator object in the following
-     manner (described in Python here, but implemented in the core):
- 
-         def make_iterator(sequence):
-             def iterator(sequence=sequence, index=[0]):
-                 item = sequence[index[0]]
-                 index[0] += 1
-                 return item
-             return iterator
- 
-     To execute the above 'for' loop, the interpreter would proceed as
-     follows, where 'iterator' is the iterator that was obtained:
- 
-         while 1:
-             try:
-                 item = iterator()
-             except IndexError:
-                 break
-             ...body...
- 
-     (Note that the 'break' above doesn't translate to a "real" Python
-     break, since it would go to the 'else:' clause of the loop whereas
-     a "real" break in the body would skip the 'else:' clause.)
- 
-     The list() and tuple() built-in functions would be updated to use
-     this same iterator logic to retrieve the items in their argument.
- 
-     List and tuple objects would implement the 'sq_iter' method by
-     calling the built-in make_iterator() routine just described.
- 
-     Instance objects would implement the 'sq_iter' method as follows:
- 
-         if hasattr(self, '__iter__'):
-             return self.__iter__()
-         elif hasattr(self, '__getitem__'):
-             return make_iterator(self)
-         else:
-             raise TypeError, thing.__class__.__name__ + \
-                 ' instance does not support iteration'
- 
-     Extension objects can implement 'sq_iter' however they wish, as
-     long as they return a callable object.
- 
- 
- Mapping Iterators
- 
-     An additional proposal from Guido is to provide special syntax
-     for iterating over mappings.  The loop:
- 
-         for key:value in mapping:
- 
-     would bind both 'key' and 'value' to a key-value pair from the
-     mapping on each iteration.  Tim Peters suggested that similarly,
- 
-         for key: in mapping:
- 
-     could iterate over just the keys and
- 
-         for :value in mapping:
- 
-     could iterate over just the values.
- 
-     The syntax is unambiguous since the new colon is currently not
-     permitted in this position in the grammar.
- 
-     This behaviour would be provided by additional methods in the
-     PyMappingMethods table: 'mp_iteritems', 'mp_iterkeys', and
-     'mp_itervalues' respectively.  'mp_iteritems' is expected to
-     produce a callable object that returns a (key, value) tuple;
-     'mp_iterkeys' and 'mp_itervalues' are expected to produce a
-     callable object that returns a single key or value.
- 
-     The implementations of these methods on instance objects would
-     then check for and call the '__iteritems__', '__iterkeys__',
-     and '__itervalues__' methods respectively.
- 
-     When 'mp_iteritems', 'mp_iterkeys', or 'mp_itervalues' is missing,
-     the default behaviour is to do make_iterator(mapping.items()),
-     make_iterator(mapping.keys()), or make_iterator(mapping.values())
-     respectively, using the definition of make_iterator() above.
- 
- 
- Indexing Sequences
- 
-     The special syntax described above can be applied to sequences
-     as well, to provide the long-hoped-for ability to obtain the
-     indices of a sequence without the strange-looking 'range(len(x))'
-     expression.
- 
-         for index:item in sequence:
- 
-     causes 'index' to be bound to the index of each item as 'item' is
-     bound to the items of the sequence in turn, and
- 
-         for index: in sequence:
- 
-     simply causes 'index' to start at 0 and increment until an attempt
-     to get sequence[index] produces an IndexError.  For completeness,
- 
-         for :item in sequence:
- 
-     is equivalent to
- 
-         for item in sequence:
- 
-     In each case we try to request an appropriate iterator from the
-     sequence.  In summary:
- 
-         for k:v in x    looks for mp_iteritems, then sq_iter
-         for k: in x     looks for mp_iterkeys, then sq_iter
-         for :v in x     looks for mp_itervalues, then sq_iter
-         for v in x      looks for sq_iter
- 
-     If we fall back to sq_iter in the first two cases, we generate
-     indices for k as needed, by starting at 0 and incrementing.
- 
-     The implementation of the mp_iter* methods on instance objects
-     then checks for methods in the following order:
- 
-         mp_iteritems    __iteritems__, __iter__, items, __getitem__
-         mp_iterkeys     __iterkeys__, __iter__, keys, __getitem__
-         mp_itervalues   __itervalues__, __iter__, values, __getitem__
-         sq_iter         __iter__, __getitem__
- 
-     If a __iteritems__, __iterkeys__, or __itervalues__ method is
-     found, we just call it and use the resulting iterator.  If a
-     mp_* function finds no such method but finds __iter__ instead,
-     we generate indices as needed.
- 
-     Upon finding an items(), keys(), or values() method, we use
-     make_iterator(x.items()), make_iterator(x.keys()), or
-     make_iterator(x.values()) respectively.  Upon finding a
-     __getitem__ method, we use it and generate indices as needed.
- 
-     For example, the complete implementation of the mp_iteritems
-     method for instances can be roughly described as follows:
- 
-         def mp_iteritems(thing):
-             if hasattr(thing, '__iteritems__'):
-                 return thing.__iteritems__()
-             if hasattr(thing, '__iter__'):
-                 def iterator(sequence=thing, index=[0]):
-                     item = (index[0], sequence.__iter__())
-                     index[0] += 1
-                     return item
-                 return iterator
-             if hasattr(thing, 'items'):
-                 return make_iterator(thing.items())
-             if hasattr(thing, '__getitem__'):
-                 def iterator(sequence=thing, index=[0]):
-                     item = (index[0], sequence[index[0]])
-                     index[0] += 1
-                     return item
-                 return iterator
-             raise TypeError, thing.__class__.__name__ + \
-                 ' instance does not support iteration over items'
- 
- 
- Examples
- 
-     Here is a class written in Python that represents the sequence of
-     lines in a file.
- 
-         class FileLines:
-             def __init__(self, filename):
-                 self.file = open(filename)
-             def __iter__(self):
-                 def iter(self=self):
-                     line = self.file.readline()
-                     if line: return line
-                     else: raise IndexError
-                 return iter
- 
-         for line in FileLines('spam.txt'):
-             print line
- 
-     And here's an interactive session demonstrating the proposed new
-     looping syntax:
- 
-         >>> for i:item in ['a', 'b', 'c']:
-         ...     print i, item
-         ...
-         0 a
-         1 b
-         2 c
-         >>> for i: in 'abcdefg':        # just the indices, please
-         ...     print i,
-         ... print
-         ...
-         0 1 2 3 4 5 6
-         >>> for k:v in os.environ:      # os.environ is an instance, but
-         ...     print k, v              # this still works because we fall
-         ...                             # back to calling items()
-         MAIL /var/spool/mail/ping
-         HOME /home/ping
-         DISPLAY :0.0
-         TERM xterm
-         .
-         .
-         .
- 
- 
  Rationale
  
--- 14,257 ----
      provide to control the behaviour of 'for' loops.  Looping is
      customized by providing a method that produces an iterator object.
!     The iterator provides a 'get next value' operation that produces
!     the nxet item in the sequence each time it is called, raising an
!     exception when no more items are available.
! 
!     In addition, specific iterators over the keys of a dictionary and
!     over the lines of a file are proposed, and a proposal is made to
!     allow spelling dict.kas_key(key) as "key in dict".
! 
!     Note: this is an almost complete rewrite of this PEP by the second
!     author, describing the actual implementation checked into the
!     trunk of the Python 2.2 CVS tree.  It is still open for
!     discussion.  Some of the more esoteric proposals in the original
!     version of this PEP have been withdrawn for now; these may be the
!     subject of a separate PEP in the future.
! 
! 
! C API Specification
! 
!     A new exception is defined, StopIteration, which can be used to
!     signal the end of an iteration.
! 
!     A new slot named tp_iter for requesting an iterator is added to
!     the type object structure.  This should be a function of one
!     PyObject * argument returning a PyObject *, or NULL.  To use this
!     slot, a new C API function PyObject_GetIter() is added, with the
!     same signature as the tp_iter slot function.
! 
!     Another new slot, named tp_iternext, is added to the type
!     structure, for obtaining the next value in the iteration.  To use
!     this slot, a new C API function PyIter_Next() is added.  The
!     signature for both the slot and the API function is as follows:
!     the argument is a PyObject * and so is the return value.  When the
!     return value is non-NULL, it is the next value in the iteration.
!     When it is NULL, there are three possibilities:
! 
!     - No exception is set; this implies the end of the iteration.
! 
!     - The StopIteration exception (or a derived exception class) is
!       set; this implies the end of the iteration.
! 
!     - Some other exception is set; this means that an error occurred
!       that should be propagated normally.
! 
!     In addition to the tp_iternext slot, every iterator object must
!     also implement a next() method, callable without arguments.  This
!     should have the same semantics as the tp_iternext slot function,
!     except that the only way to signal the end of the iteration is to
!     raise StopIteration.  The iterator object should not care whether
!     its tp_iternext slot function is called or its next() method, and
!     the caller may mix calls arbitrarily.  (The next() method is for
!     the benefit of Python code using iterators directly; the
!     tp_iternext slot is added to make 'for' loops more efficient.)
! 
!     To ensure binary backwards compatibility, a new flag
!     Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
!     field, and to the default flags macro.  This flag must be tested
!     before accessing the tp_iter or tp_iternext slots.  The macro
!     PyIter_Check() tests whether an object has the appropriate flag
!     set and has a non-NULL tp_iternext slot.  There is no such macro
!     for the tp_iter slot (since the only place where this slot is
!     referenced should be PyObject_GetIter()).
! 
!     (Note: the tp_iter slot can be present on any object; the
!     tp_iternext slot should only be present on objects that act as
!     iterators.)
! 
!     For backwards compatibility, the PyObject_GetIter() function
!     implements fallback semantics when its argument is a sequence that
!     does not implement a tp_iter function: a lightweight sequence
!     iterator object is constructed in that case which iterates over
!     the items of the sequence in the natural order.
! 
!     The Python bytecode generated for 'for' loops is changed to use
!     new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol
!     rather than the sequence protocol to get the next value for the
!     loop variable.  This makes it possible to use a 'for' loop to loop
!     over non-sequence objects that support the tp_iter slot.  Other
!     places where the interpreter loops over the values of a sequence
!     should also be changed to use iterators.
! 
!     Iterators ought to implement the tp_iter slot as returning a
!     reference to themselves; this is needed to make it possible to
!     use an iterator (as opposed to a sequence) in a for loop.
! 
! 
! Python API Specification
! 
!     The StopIteration exception is made visiable as one of the
!     standard exceptions.  It is derived from Exception.
! 
!     A new built-in function is defined, iter(), which can be called in
!     two ways:
! 
!     - iter(obj) calls PyObject_GetIter(obj).
! 
!     - iter(callable, sentinel) returns a special kind of iterator that
!       calls the callable to produce a new value, and compares the
!       return value to the sentinel value.  If the return value equals
!       the sentinel, this signals the end of the iteration and
!       StopIteration is raised rather than returning normal; if the
!       return value does not equal the sentinel, it is returned as the
!       next value from the iterator.  If the callable raises an
!       exception, this is propagated normally; in particular, the
!       function is allowed to raise StopError as an alternative way to
!       end the iteration.  (This functionality is available from the C
!       API as PyCallIter_New(callable, sentinel).)
! 
!     Iterator objects returned by either form of iter() have a next()
!     method.  This method either returns the next value in the
!     iteration, or raises StopError (or a derived exception class) to
!     signal the end of the iteration.  Any other exception should be
!     considered to signify an error and should be propagated normally,
!     not taken to mean the end of the iteration.
! 
!     Classes can define how they are iterated over by defining an
!     __iter__() method; this should take no additional arguments and
!     return a valid iterator object.  A class is a valid iterator
!     object when it defines a next() method that behaves as described
!     above.  A class that wants to be an iterator also ought to
!     implement __iter__() returning itself.
! 
!     There is some controversy here:
! 
!     - The name iter() is an abbreviation.  Alternatives proposed
!       include iterate(), harp(), traverse(), narrate().
! 
!     - Using the same name for two different operations (getting an
!       iterator from an object and making an iterator for a function
!       with an sentinel value) is somewhat ugly.  I haven't seen a
!       better name for the second operation though.
! 
! 
! Dictionary Iterators
! 
!     The following two proposals are somewhat controversial.  They are
!     also independent from the main iterator implementation.  However,
!     they are both very useful.
! 
!     - Dictionaries implement a sq_contains slot that implements the
!       same test as the has_key() method.  This means that we can write
! 
!           if k in dict: ...
! 
!       which is equivalent to
! 
!           if dict.has_key(k): ...
! 
!     - Dictionaries implement a tp_iter slot that returns an efficient
!       iterator that iterates over the keys of the dictionary.  During
!       such an iteration, the dictionary should not be modified, except
!       that setting the value for an existing key is allowed (deletions
!       or additions are not, nor is the update() method).  This means
!       that we can write
! 
!           for k in dict: ...
! 
!       which is equivalent to, but much faster than
! 
!           for k in dict.keys(): ...
! 
!       as long as the restriction on modifications to the dictionary
!       (either by the loop or by another thread) are not violated.
! 
!     There is no doubt that the dict.has_keys(x) interpretation of "x
!     in dict" is by far the most useful interpretation, probably the
!     only useful one.  There has been resistance against this because
!     "x in list" checks whether x is present among the values, while
!     the proposal makes "x in dict" check whether x is present among
!     the keys.  Given that the symmetry between lists and dictionaries
!     is very weak, this argument does not have much weight.
! 
!     The main discussion focuses on whether
! 
!         for x in dict: ...
! 
!     should assign x the successive keys, values, or items of the
!     dictionary.  The symmetry between "if x in y" and "for x in y"
!     suggests that it should iterate over keys.  This symmetry has been
!     observed by many independently and has even been used to "explain"
!     one using the other.  This is because for sequences, "if x in y"
!     iterates over y comparing the iterated values to x.  If we adopt
!     both of the above proposals, this will also hold for
!     dictionaries.
! 
!     The argument against making "for x in dict" iterate over the keys
!     comes mostly from a practicality point of view: scans of the
!     standard library show that there are about as many uses of "for x
!     in dict.items()" as there are of "for x in dict.keys()", with the
!     items() version having a small majority.  Presumably many of the
!     loops using keys() use the corresponding value anyway, by writing
!     dict[x], so (the argument goes) by making both the key and value
!     available, we could support the largest number of cases.  While
!     this is true, I (Guido) find the correspondence between "for x in
!     dict" and "if x in dict" too compelling to break, and there's not
!     much overhead in having to write dict[x] to explicitly get the
!     value.  We could also add methods to dictionaries that return
!     different kinds of iterators, e.g.
! 
!         for key, value in dict.iteritems(): ...
! 
!         for value in dict.itervalues(): ...
! 
!         for key in dict.iterkeys(): ...
! 
! 
! File Iterators
! 
!     The following proposal is not controversial, but should be
!     considered a separate step after introducing the iterator
!     framework described above.  It is useful because it provides us
!     with a good answer to the complaint that the common idiom to
!     iterate over the lines of a file is ugly and slow.
! 
!     - Files implement a tp_iter slot that is equivalent to
!       iter(f.readline, "").  This means that we can write
! 
!           for line in file:
!               ...
! 
!       as a shorthand for
! 
!           for line in iter(file.readline, ""):
!               ...
! 
!       which is equivalent to, but faster than
! 
!           while 1:
!               line = file.readline()
!               if not line:
!                   break
!               ...
! 
!     This also shows that some iterators are destructive: they consume
!     all the values and a second iterator cannot easily be created that
!     iterates independently over the same values.  You could open the
!     file for a second time, or seek() to the beginning, but these
!     solutions don't work for all file types, e.g. they don't work when
!     the open file object really represents a pipe or a stream socket.
  
  
  Rationale
  
***************
*** 246,252 ****
      1. It provides an extensible iterator interface.
  
!     2. It resolves the endless "i indexing sequence" debate.
  
!     3. It allows performance enhancements to dictionary iteration.
  
      4. It allows one to provide an interface for just iteration
--- 262,268 ----
      1. It provides an extensible iterator interface.
  
!     1. It allows performance enhancements to list iteration.
  
!     3. It allows big performance enhancements to dictionary iteration.
  
      4. It allows one to provide an interface for just iteration
***************
*** 258,351 ****
         {__getitem__, keys, values, items}.
  
- 
- Errors
- 
-     Errors that occur during sq_iter, mp_iter*, or the __iter*__
-     methods are allowed to propagate normally to the surface.
- 
-     An attempt to do
- 
-         for item in dict:
- 
-     over a dictionary object still produces:
- 
-         TypeError: loop over non-sequence
- 
-     An attempt to iterate over an instance that provides neither
-     __iter__ nor __getitem__ produces:
- 
-         TypeError: instance does not support iteration
- 
-     Similarly, an attempt to do mapping-iteration over an instance
-     that doesn't provide the right methods should produce one of the
-     following errors:
- 
-         TypeError: instance does not support iteration over items
-         TypeError: instance does not support iteration over keys
-         TypeError: instance does not support iteration over values
- 
-     It's an error for the iterator produced by __iteritems__ or
-     mp_iteritems to return an object whose length is not 2:
  
!         TypeError: item iterator did not return a 2-tuple
! 
! 
! Open Issues
! 
!     We could introduce a new exception type such as IteratorExit just
!     for terminating loops rather than using IndexError.  In this case,
!     the implementation of make_iterator() would catch and translate an
!     IndexError into an IteratorExit for backward compatibility.
! 
!     We could provide access to the logic that calls either 'sq_item'
!     or make_iterator() with an iter() function in the built-in module
!     (just as the getattr() function provides access to 'tp_getattr').
!     One possible motivation for this is to make it easier for the
!     implementation of __iter__ to delegate iteration to some other
!     sequence.  Presumably we would then have to consider adding
!     iteritems(), iterkeys(), and itervalues() as well.
! 
!     An alternative way to let __iter__ delegate iteration to another
!     sequence is for it to return another sequence.  Upon detecting
!     that the object returned by __iter__ is not callable, the
!     interpreter could repeat the process of looking for an iterator
!     on the new object.  However, this process seems potentially
!     convoluted and likely to produce more confusing error messages.
! 
!     If we decide to add "freezing" ability to lists and dictionaries,
!     it is suggested that the implementation of make_iterator
!     automatically freeze any list or dictionary argument for the
!     duration of the loop, and produce an error complaining about any
!     attempt to modify it during iteration.  Since it is relatively
!     rare to actually want to modify it during iteration, this is
!     likely to catch mistakes earlier.  If a programmer wants to
!     modify a list or dictionary during iteration, they should
!     explicitly make a copy to iterate over using x[:], x.clone(),
!     x.keys(), x.values(), or x.items().
! 
!     For consistency with the 'key in dict' expression, we could
!     support 'for key in dict' as equivalent to 'for key: in dict'.
! 
! 
! BDFL Pronouncements
! 
!     The "parallel expression" to 'for key:value in mapping':
! 
!         if key:value in mapping:
! 
!     is infeasible since the first colon ends the "if" condition.
!     The following compromise is technically feasible:
! 
!         if (key:value) in mapping:
! 
!     but the BDFL has pronounced a solid -1 on this.
! 
!     The BDFL gave a +0.5 to:
! 
!         for key:value in mapping:
!         for index:item in sequence:
  
!     and a +0.2 to the variations where the part before or after
!     the first colon is missing.
  
  
--- 274,281 ----
         {__getitem__, keys, values, items}.
  
  
! Copyright
  
!     This document is in the public domain.