[Python-checkins] python/nondist/peps pep-0323.txt, NONE, 1.1 pep-0000.txt, 1.254, 1.255

aleax at users.sourceforge.net aleax at users.sourceforge.net
Mon Oct 27 10:01:11 EST 2003


Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1:/tmp/cvs-serv9167

Modified Files:
	pep-0000.txt 
Added Files:
	pep-0323.txt 
Log Message:
Added PEP 323, "Copyable Iterators".



--- NEW FILE: pep-0323.txt ---
PEP: 323
Title: Copyable Iterators
Version: $Revision: 1.1 $
Last-Modified: $Date: 2003/10/27 15:01:08 $
Author: Alex Martelli <aleaxit at yahoo.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 25-Oct-2003
Python-Version: 2.4
Post-History:


Abstract

    This PEP suggests that some iterator types should support being copied
    (shallowly or deeply), by exposing suitable methods such as __copy__ or
    __deepcopy__.


Motivation

    Some iterators, such as those which built-in function iter builds over
    sequences, should be intrinsically easy to copy (just get another
    reference to the same sequence and a copy of the index) or deepcopy
    (ditto but with a deepcopy of the underlying sequence).

    However, in Python up to 2.3, those iterators don't expose their state:
    user code which is using such an iterator X is not able to just "save" a
    copy of X (for future iteration from the current point) with a simple
    "saved_X = copy.copy(X)".

    To "tee" a non-copyable iterator into two independently iterable ones,
    subtle code [such as that which will go into itertools.tee in 2.4] is
    needed (a Python version thereof is currently in the itertools docs).

    In any case, such code will inevitably have to consume memory to keep
    a copy of the subsequence which has been iterated on by one but not
    both tee'd iterators.  This is a waste of memory in those cases in
    which the iterator being tee'd already _has_ an underlying sequence
    (or other iterable) and index.

    Having iterators expose a suitable __copy__ when feasible allows easy
    optimization of itertools.tee and similar user code, as in:

        def tee(it):
            it = iter(it)
            try: return it, copy.copy(it)
            except TypeError:
                # non-copyable iterator, do all the needed hard work


Specification

    Any iterator type X may expose a method __copy__ that is callable
    without arguments on any instance x of X.  The method should be
    exposed if and only if the iterator type can provide copyability with
    reasonably little computational and memory effort.  Similarly, X may
    expose a method __deepcopy__ that is callable with one argument (a memo
    dictionary) on instances of X.  The (very concise...) specs for
    these methods are at [2] -- for more details, see also file copy.py
    in the Python standard library.

    For example, suppose a class Iter essentially duplicated the
    functionality of the iter builtin for iterating on a sequence:

        class Iter(object):

            def __init__(self, sequence):
                self.sequence = sequence
                self.index = 0

            def __iter__(self):
                return self

            def next(self):
                try: result = self.sequence[self.index]
		except IndexError: raise StopIteration
                self.index += 1
                return result

    To make this Iter class compliant with this PEP, the following
    additions to the body of class Iter would suffice:

            def __copy__(self):
                result = self.__class__(sequence)
                result.index = self.index
                return result

            def __deepcopy__(self, memo):
                result = self.__copy__()
                result.sequence = deepcopy(self.sequence, memo)
                return result


Details

    Besides adding to the Python docs a recommendation that user-coded
    iterators be made copyable when feasible, this PEP's implementation
    will specifically include the addition of copyability to the iterators
    over sequences that built-in iter returns, and also to the iterators 
    over a dictionary returned by the methods __iter__, iterkeys, itervalues,
    and iteritems of built-in type dict.

    Iterators produced by generators will not be copyable (the BDFL deems
    shallow copy impossible, and deep copy too much trouble).  However,
    iterators produced by the new "generator expressions" of Python 2.4
    (PEP 289 [3]) should be copyable if their underlying iterator[s] are;
    the strict limitations on what is possible in a generator expression,
    compared to the much vaster generality of a generator, should make
    that feasible.  Similarly, the iterators produced by the built-in
    function enumerate should be copyable if the underlying iterator is.

    The implementation of this PEP will also include the optimization
    of the new itertools.tee function mentioned in the Motivation section.


Rationale

    Being able to copy iterators will allow copying of user-coded classes
    that have copyable iterators as attributes.  This applies to both
    shallow and deep copying.

    Deep copyability of suitable iterators will allow "decoupling" from
    an underlying mutable sequence, and, according to the BDFL, may in
    the future allow certain iterators to become picklable (however, this
    PEP, in itself, does not include picklability of iterators).

    The main use case for (shallow) copying of an iterator is the same
    as for the function itertools.tee (new in 2.4).  Indeed, we assume
    that user code will typically not directly attempt to copy.copy an
    iterator, because it would have to deal with uncopyable cases; just
    calling itertools.tee will enable copying when feasible, with an
    implicit fallback to a maximally efficient non-copying strategy for
    iterators that are not copyable.

    A tee'd iterator may serve as a "reference point", allowing processing 
    of a sequence to continue or resume from a known point, while the other
    independent iterator can be freely advanced to "explore" a further part 
    of the sequence as needed.  To fully exploit this pattern, it should
    ideally also be possible, given two iterators originally produced by
    iterator.tee, to check if they are "equal" (have been stepped the same
    number of times); however, this PEP, in itself, does not include
    comparability of such iterators.  Therefore, code using this pattern
    may also need to count the number of forward steps taken after tee by 
    the iterators, so as to be able to tell which one is "behind", and by 
    how much.  Built-in function enumerate is one way to make such counting
    very easy in many situations.  However, such needs do not always arise.

    Here is a simpler example: a generator which, given an iterator of
    numbers (assumed to be positive), returns a corresponding iterator
    each of whose items is the fraction of the total corresponding to each
    corresponding item of the input iterator.  The caller may pass the
    total as a value, if known in advance; otherwise, the iterator
    returned by calling the generator will first compute the total.

        def fractions(numbers, total=None):
            if total is None:
                numbers, aux = itertools.tee(numbers)
                total = sum(aux)
            total = float(total)
            for item in numbers:
                yield item / total

    The ability to tee the numbers iterator allows this generator to
    precompute the total, if needed, without necessarily requiring
    O(N) auxiliary memory if the numbers iterator is copyable.
    

References

    [1] Discussion on python-dev starting at post:
        http://mail.python.org/pipermail/python-dev/2003-October/038969.html

    [2] Online documentation for the copy module of the standard library:
        http://www.python.org/doc/current/lib/module-copy.html

    [3] PEP 289, Generator Expressions, Hettinger
        http://www.python.org/peps/pep-0289.html

Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:

Index: pep-0000.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v
retrieving revision 1.254
retrieving revision 1.255
diff -C2 -d -r1.254 -r1.255
*** pep-0000.txt	22 Oct 2003 18:09:35 -0000	1.254
--- pep-0000.txt	27 Oct 2003 15:01:08 -0000	1.255
***************
*** 120,123 ****
--- 120,124 ----
   S   321  Date/Time Parsing and Formatting             Kuchling
   S   322  Reverse Iteration Methods                    Hettinger
+  S   323  Copyable Iterators                           Martelli
   S   754  IEEE 754 Floating Point Special Values       Warnes
  





More information about the Python-checkins mailing list