python/nondist/peps pep-0323.txt, NONE, 1.1 pep-0000.txt, 1.254, 1.255
![](https://secure.gravatar.com/avatar/558c7d1982e92f48d2038bbbe5948f42.jpg?s=120&d=mm&r=g)
Update of /cvsroot/python/python/nondist/peps In directory sc8-pr-cvs1:/tmp/cvs-serv9167 Modified Files: pep-0000.txt Added Files: pep-0323.txt Log Message: Added PEP 323, "Copyable Iterators". --- NEW FILE: pep-0323.txt --- PEP: 323 Title: Copyable Iterators Version: $Revision: 1.1 $ Last-Modified: $Date: 2003/10/27 15:01:08 $ Author: Alex Martelli <aleaxit@yahoo.com> Status: Draft Type: Standards Track Content-Type: text/plain Created: 25-Oct-2003 Python-Version: 2.4 Post-History: Abstract This PEP suggests that some iterator types should support being copied (shallowly or deeply), by exposing suitable methods such as __copy__ or __deepcopy__. Motivation Some iterators, such as those which built-in function iter builds over sequences, should be intrinsically easy to copy (just get another reference to the same sequence and a copy of the index) or deepcopy (ditto but with a deepcopy of the underlying sequence). However, in Python up to 2.3, those iterators don't expose their state: user code which is using such an iterator X is not able to just "save" a copy of X (for future iteration from the current point) with a simple "saved_X = copy.copy(X)". To "tee" a non-copyable iterator into two independently iterable ones, subtle code [such as that which will go into itertools.tee in 2.4] is needed (a Python version thereof is currently in the itertools docs). In any case, such code will inevitably have to consume memory to keep a copy of the subsequence which has been iterated on by one but not both tee'd iterators. This is a waste of memory in those cases in which the iterator being tee'd already _has_ an underlying sequence (or other iterable) and index. Having iterators expose a suitable __copy__ when feasible allows easy optimization of itertools.tee and similar user code, as in: def tee(it): it = iter(it) try: return it, copy.copy(it) except TypeError: # non-copyable iterator, do all the needed hard work Specification Any iterator type X may expose a method __copy__ that is callable without arguments on any instance x of X. The method should be exposed if and only if the iterator type can provide copyability with reasonably little computational and memory effort. Similarly, X may expose a method __deepcopy__ that is callable with one argument (a memo dictionary) on instances of X. The (very concise...) specs for these methods are at [2] -- for more details, see also file copy.py in the Python standard library. For example, suppose a class Iter essentially duplicated the functionality of the iter builtin for iterating on a sequence: class Iter(object): def __init__(self, sequence): self.sequence = sequence self.index = 0 def __iter__(self): return self def next(self): try: result = self.sequence[self.index] except IndexError: raise StopIteration self.index += 1 return result To make this Iter class compliant with this PEP, the following additions to the body of class Iter would suffice: def __copy__(self): result = self.__class__(sequence) result.index = self.index return result def __deepcopy__(self, memo): result = self.__copy__() result.sequence = deepcopy(self.sequence, memo) return result Details Besides adding to the Python docs a recommendation that user-coded iterators be made copyable when feasible, this PEP's implementation will specifically include the addition of copyability to the iterators over sequences that built-in iter returns, and also to the iterators over a dictionary returned by the methods __iter__, iterkeys, itervalues, and iteritems of built-in type dict. Iterators produced by generators will not be copyable (the BDFL deems shallow copy impossible, and deep copy too much trouble). However, iterators produced by the new "generator expressions" of Python 2.4 (PEP 289 [3]) should be copyable if their underlying iterator[s] are; the strict limitations on what is possible in a generator expression, compared to the much vaster generality of a generator, should make that feasible. Similarly, the iterators produced by the built-in function enumerate should be copyable if the underlying iterator is. The implementation of this PEP will also include the optimization of the new itertools.tee function mentioned in the Motivation section. Rationale Being able to copy iterators will allow copying of user-coded classes that have copyable iterators as attributes. This applies to both shallow and deep copying. Deep copyability of suitable iterators will allow "decoupling" from an underlying mutable sequence, and, according to the BDFL, may in the future allow certain iterators to become picklable (however, this PEP, in itself, does not include picklability of iterators). The main use case for (shallow) copying of an iterator is the same as for the function itertools.tee (new in 2.4). Indeed, we assume that user code will typically not directly attempt to copy.copy an iterator, because it would have to deal with uncopyable cases; just calling itertools.tee will enable copying when feasible, with an implicit fallback to a maximally efficient non-copying strategy for iterators that are not copyable. A tee'd iterator may serve as a "reference point", allowing processing of a sequence to continue or resume from a known point, while the other independent iterator can be freely advanced to "explore" a further part of the sequence as needed. To fully exploit this pattern, it should ideally also be possible, given two iterators originally produced by iterator.tee, to check if they are "equal" (have been stepped the same number of times); however, this PEP, in itself, does not include comparability of such iterators. Therefore, code using this pattern may also need to count the number of forward steps taken after tee by the iterators, so as to be able to tell which one is "behind", and by how much. Built-in function enumerate is one way to make such counting very easy in many situations. However, such needs do not always arise. Here is a simpler example: a generator which, given an iterator of numbers (assumed to be positive), returns a corresponding iterator each of whose items is the fraction of the total corresponding to each corresponding item of the input iterator. The caller may pass the total as a value, if known in advance; otherwise, the iterator returned by calling the generator will first compute the total. def fractions(numbers, total=None): if total is None: numbers, aux = itertools.tee(numbers) total = sum(aux) total = float(total) for item in numbers: yield item / total The ability to tee the numbers iterator allows this generator to precompute the total, if needed, without necessarily requiring O(N) auxiliary memory if the numbers iterator is copyable. References [1] Discussion on python-dev starting at post: http://mail.python.org/pipermail/python-dev/2003-October/038969.html [2] Online documentation for the copy module of the standard library: http://www.python.org/doc/current/lib/module-copy.html [3] PEP 289, Generator Expressions, Hettinger http://www.python.org/peps/pep-0289.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: Index: pep-0000.txt =================================================================== RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v retrieving revision 1.254 retrieving revision 1.255 diff -C2 -d -r1.254 -r1.255 *** pep-0000.txt 22 Oct 2003 18:09:35 -0000 1.254 --- pep-0000.txt 27 Oct 2003 15:01:08 -0000 1.255 *************** *** 120,123 **** --- 120,124 ---- S 321 Date/Time Parsing and Formatting Kuchling S 322 Reverse Iteration Methods Hettinger + S 323 Copyable Iterators Martelli S 754 IEEE 754 Floating Point Special Values Warnes
participants (1)
-
aleax@users.sourceforge.net