Status of PEP's?

Wed Feb 27 18:52:34 EST 2002

Below is a revision (not yet submitted) of PEP 276.

Rather than restart the entire debate on PEP 276, please send me
suggestions for specific changes to the PEP.

Thanks,

Jim

===============================================================

PEP: 276
Title: Simple Iterator for ints
Version: $Revision: 1.1 $
Last-Modified: $Date: 2001/11/13 20:52:37 $
Author: james_althoff at i2.com (Jim Althoff)
Status: Draft
Type: Standards Track
Created: 12-Nov-2001
Python-Version: 2.3
Post-History:

Abstract

    Python 2.1 added new functionality to support iterators[1].
    Iterators have proven to be useful and convenient in many coding
    situations.  It is noted that the implementation of Python's
    for-loop control structure uses the iterator protocol as of
    release 2.1.  It is also noted that Python provides iterators for
    the following builtin types: lists, tuples, dictionaries, strings,
    and files.  This PEP proposes the addition of an iterator for the
    builtin type int (types.IntType).  Such an iterator would simplify
    the coding of certain for-loops in Python.

Specification

    Define an iterator for types.intType (i.e., the builtin type
    "int") that is returned from the builtin function "iter" when
    called with an instance of types.intType as the argument.

    The returned iterator has the following behavior:

    - Assume that object i is an instance of types.intType (the
      builtin type int) and that i > 0

    - iter(i) returns an iterator object

    - said iterator object iterates through the sequence of ints
      0,1,2,...,i-1

    Example:

        iter(5) returns an iterator object that iterates through the
        sequence of ints 0,1,2,3,4

    - if i <= 0, iter(i) returns an "empty" iterator, i.e., one that
      throws StopIteration upon the first call of its "next" method

    In other words, the conditions and semantics of said iterator is
    consistent with the conditions and semantics of the range() and
    xrange() functions.

    Note that the sequence 0,1,2,...,i-1 associated with the int i is
    considered "natural" in the context of Python programming because
    it is consistent with the builtin indexing protocol of sequences
    in Python.  Python lists and tuples, for example, are indexed
    starting at 0 and ending at len(object)-1 (when using positive
    indices).  In other words, such objects are indexed with the
    sequence 0,1,2,...,len(object)-1

Rationale

    A common programming idiom is to take a collection of objects and
    apply some operation to each item in the collection in some
    established sequential order.  Python provides the "for in"
    looping control structure for handling this common idiom.  Cases
    arise, however, where it is necessary (or more convenient) to
    access each item in an "indexed" collection by iterating through
    each index and accessing each item in the collection using the
    corresponding index.

    For example, one might have a two-dimensional "table" object where one
    requires the application of some operation to the first column of
    each row in the table.  Depending on the implementation of the table
    it might not be possible to access first each row and then each
    column as individual objects.  It might, rather, be possible to
    access a cell in the table using a row index and a column index.
    In such a case it is necessary to use an idiom where one iterates
    through a sequence of indices (indexes) in order to access the
    desired items in the table.  (Note that the commonly used
    DefaultTableModel class in Java-Swing-Jython has this very protocol).

    Another common example is where one needs to process two or more
    collections in parallel.  Another example is where one needs to
    access, say, every second item in a collection.

    There are many other examples where access to items in a
    collection is facilitated by a computation on an index thus
    necessitating access to the indices rather than direct access to
    the items themselves.

    Let's call this idiom the "indexed for-loop" idiom.  Some
    programming languages provide builtin syntax for handling this
    idiom.  In Python the common convention for implementing the
    indexed for-loop idiom is to use the builtin range() or xrange()
    function to generate a sequence of indices as in, for example:

     for rowcount in range(table.getRowCount()):
         print table.getValueAt(rowcount, 0)

    or

     for rowcount in xrange(table.getRowCount()):
         print table.getValueAt(rowcount, 0)

    From time to time there are discussions in the Python community
    about the indexed for-loop idiom.  It is sometimes argued that the
    need for using the range() or xrange() function for this design
    idiom is:

    - Not obvious (to new-to-Python programmers),

    - Error prone (easy to forget, even for experienced Python
      programmers)

    - Confusing and distracting for those who feel compelled to understand
      the differences and recommended usage of xrange() vis-a-vis range()

    - Unwieldy, especially when combined with the len() function,
      i.e., xrange(len(sequence))

    - Not as convenient as equivalent mechanisms in other languages,

    - Annoying, a "wart", etc.

    And from time to time proposals are put forth for ways in which
    Python could provide a better mechanism for this idiom.  Recent
    examples include PEP 204, "Range Literals", and PEP 212, "Loop
    Counter Iteration".

    Most often, such proposal include changes to Python's syntax and
    other "heavyweight" changes.

    Part of the difficulty here is that advocating new syntax implies
    a comprehensive solution for "general indexing" that has to
    include aspects like:

    - starting index value

    - ending index value

    - step value

    - open intervals versus closed intervals versus half opened intervals

    Finding a new syntax that is comprehensive, simple, general,
    Pythonic, appealing to many, easy to implement, not in conflict
    with existing structures, not excessively overloading of existing
    structures, etc. has proven to be more difficult than one might
    anticipate.

    The proposal outlined in this PEP tries to address the problem by
    suggesting a simple "lightweight" solution that helps the most
    common case by using a proven mechanism that is already available
    (as of Python 2.1): namely, iterators.

    Because for-loops already use "iterator" protocol as of Python
    2.1, adding an iterator for types.IntType as proposed in this PEP
    would enable by default the following shortcut for the indexed
    for-loop idiom:

     for rowcount in table.getRowCount():
         print table.getValueAt(rowcount, 0)

    The following benefits for this approach vis-a-vis the current
    mechanism of using the range() or xrange() functions are claimed
    to be:

    - Simpler,

    - Less cluttered,

    - Focuses on the problem at hand without the need to resort to
      secondary implementation-oriented functions (range() and
      xrange())

    And compared to other proposals for change:

    - Requires no new syntax

    - Requires no new keywords

    - Takes advantage of the new and well-established iterator mechanism

    And generally:

    -  Is consistent with iterator-based "convenience" changes already
       included (as of Python 2.1) for other builtin types such as:
       lists, tuples, dictionaries, strings, and files.

Backwards Compatibility

    The proposed mechanism is generally backwards compatible as it
    calls for neither new syntax nor new keywords.  All existing,
    valid Python programs should continue to work unmodified.

    However, this proposal is not perfectly backwards compatible in
    the sense that certain statements that are currently invalid
    would, under the current proposal, become valid.

    Tim Peters has pointed out two such examples:

    1) The common case where one forgets to include range() or
       xrange(), for example:

        for rowcount in table.getRowCount():
            print table.getValueAt(rowcount, 0)

       in Python 2.2 raises a TypeError exception.

       Under the current proposal, the above statement would be valid
       and would work as (presumably) intended.  Presumably, this is a
       good thing.

       As noted by Tim, this is the common case of the "forgotten
       range" mistake (which one currently corrects by adding a call
       to range() or xrange()).

    2) The (hopefully) very uncommon case where one makes a typing
       mistake when using tuple unpacking.  For example:

           x, = 1

       in Python 2.2 raises a TypeError exception.

       Under the current proposal, the above statement would be valid
       and would set x to 0.  The PEP author has no data as to how
       common this typing error is nor how difficult it would be to
       catch such an error under the current proposal.  He imagines
       that it does not occur frequently and that it would be
       relatively easy to correct should it happen.

Issues:

    Extensive discussions concerning PEP 276 on the Python interest
    mailing list suggests a range of opinions: some in favor, some
    neutral, some against.  Those in favor tend to agree with the
    claims above of the usefulness, convenience, and simplicity of
    a simple iterator for integers.

    Issues with PEP 276 include:

    - Some feel that iterating over the sequence "0, 1, 2, ..., n-1"
      for an integer n is not intuitive.  "for i in 5:" is considered
      (by some) to be "non-obvious", for example.

      Response: This seems to be largely a matter of preference.
      Some like the proposed idiom and see it as simple and
      elegant.  Some are neutral on this issue.  Others, as noted,
      dislike it.

    - Is it obvious that iter(5) maps to the sequence 0,1,2,3,4?

      Response: Given, as noted above, that Python has a strong
      convention for indexing sequences starting at 0 and stopping at
      (inclusively) the index whose value is one less than the length
      of the sequence, it is argued that the proposed sequence is
      reasonably intuitive to a Python programmer while being useful
      and practical.  More importantly, it is argued that once learned
      this convention is very easy to remember.

    - Possible ambiguity

          for i in 10: print i

      might be mistaken for

     for i in (10,): print i

      Response: This is exactly the same situation with strings in
      current Python (replace 10 with 'spam' in the above, for
      example).

    - Might dissuade newbies from using the indexed for-loop idiom when
      the standard "for item in collection:" idiom is clearly better.

      Response: The standard idiom is so nice when it fits that it
      needs neither extra "carrot" nor "stick".  On the other hand,
      one does notice cases of overuse/misuse of the standard idiom
      (due, most likely, to the awkwardness of the indexed for-loop
      idiom), as in:

       for item in sequence:
           print sequence.index(item)

    The majority of disagreement with PEP 276 came from those who
    favor much larger changes to Python to address the more general
    problem of specifying a sequence of integers where such
    a specification is general enough to handle the starting value,
    ending value, and stepping value of the sequence and also
    addresses variations of open, closed, and half-open (half-closed)
    integer intervals.  Many suggestions of such were discussed.

    These include:

    - adding Haskell-like notation for specifying a sequence of
      integers in a literal list,

    - various uses of slicing notation to specify sequences,

    - changes to the syntax of for-in loops to allow the use of
      relational operators in the loop header,

    - creation of an integer-interval class along with methods that
      overload relational operators or division operators
      to provide "slicing" on integer-interval objects,

    - and more.

    It should be noted that there was much debate but not an
    overwhelming concensus for any of these larger-scale suggestions.

    Clearly, PEP 276 does not propose such a large-scale change
    and instead focuses on a specific problem area.  Towards the
    end of the discussion period, several posters expressed favor
    for the narrow focus and simplicity of PEP 276 vis-a-vis the more
    ambitious suggestions that were advanced.  There did appear to be
    concensus for the need for a PEP for any such larger-scale,
    alternative suggestion.  In light of this recognition, details of
    the various alternative suggestions are not discussed here further.

Implementation

    An implementation is not available at this time but is expected
    to be straightforward.

References

    [1] PEP 234, Iterators
    http://python.sourceforge.net/peps/pep-0234.html

    [2] PEP 204, Range Literals
    http://python.sourceforge.net/peps/pep-0204.html

    [3] PEP 212, Loop Counter Iteration
    http://python.sourceforge.net/peps/pep-0212.html

Copyright

    This document has been placed in the public domain.

Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: