Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1:/tmp/cvs-serv32477
Modified Files:
pep-0323.txt
Log Message:
Rewritten, with removal of deep-copying ideas, but otherwise many additions of
details and examples, as per feedback received on python-dev.
Index: pep-0323.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0323.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** pep-0323.txt 27 Oct 2003 15:01:08 -0000 1.1
--- pep-0323.txt 29 Oct 2003 13:39:51 -0000 1.2
***************
*** 9,52 ****
Created: 25-Oct-2003
Python-Version: 2.4
! Post-History:
Abstract
! This PEP suggests that some iterator types should support being copied
! (shallowly or deeply), by exposing suitable methods such as __copy__ or
! __deepcopy__.
Motivation
! Some iterators, such as those which built-in function iter builds over
! sequences, should be intrinsically easy to copy (just get another
! reference to the same sequence and a copy of the index) or deepcopy
! (ditto but with a deepcopy of the underlying sequence).
! However, in Python up to 2.3, those iterators don't expose their state:
! user code which is using such an iterator X is not able to just "save" a
! copy of X (for future iteration from the current point) with a simple
! "saved_X = copy.copy(X)".
! To "tee" a non-copyable iterator into two independently iterable ones,
! subtle code [such as that which will go into itertools.tee in 2.4] is
! needed (a Python version thereof is currently in the itertools docs).
! In any case, such code will inevitably have to consume memory to keep
! a copy of the subsequence which has been iterated on by one but not
! both tee'd iterators. This is a waste of memory in those cases in
! which the iterator being tee'd already _has_ an underlying sequence
! (or other iterable) and index.
! Having iterators expose a suitable __copy__ when feasible allows easy
! optimization of itertools.tee and similar user code, as in:
def tee(it):
it = iter(it)
! try: return it, copy.copy(it)
! except TypeError:
# non-copyable iterator, do all the needed hard work
--- 9,113 ----
Created: 25-Oct-2003
Python-Version: 2.4
! Post-History: 29-Oct-2003
Abstract
! This PEP suggests that some iterator types should support shallow
! copies of their instances by exposing a __copy__ method which meets
! some specific requirements, and indicates how code using an iterator
! might exploit such a __copy__ method when present.
Motivation
! In Python up to 2.3, most built-in iterator types don't let the user
! copy their instances. User-coded iterators that do let their clients
! call copy.copy on their instances may, or may not, happen to return,
! as a result of the copy, a separate iterator object that may be
! iterated upon independently from the original.
! Currently, "support" for copy.copy in a user-coded iterator type is
! almost invariably "accidental" -- i.e., the standard machinery of the
! copy method in Python's standard library's copy module does build and
! return a copy. However, the copy will be independently iterable with
! respect to the original only if calling .next() on an instance of that
! class happens to change instance state solely by rebinding some
! attributes to new values, and not by mutating some attributes'
! existing values.
! For example, an iterator whose "index" state is held as an integer
! attribute will probably give usable copies, since (integers being
! immutable) .next() presumably just rebinds that attribute. On the
! other hand, another iterator whose "index" state is held as a list
! attribute will probably mutate the same list object when .next()
! executes, and therefore copies of such an iterator will not be
! iterable separately and independently from the original.
! Given this existing situation, copy.copy(it) on some iterator object
! isn't very useful, nor, therefore, is it at all widely used. However,
! there are many cases in which being able to get a "snapshot" of an
! iterator, as a "bookmark", so as to be able to keep iterating along
! the sequence but later iterate again on the same sequence from the
! bookmark onwards, is useful. To support such "bookmarking", module
! itertools, in 2.4, has grown a 'tee' function, to be used as:
! it, bookmark = itertools.tee(it)
!
! The previous value of 'it' must not be used again, which is why this
! typical usage idiom rebinds the name. After this call, 'it' and
! 'bookmark' are independently-iterable iterators on the same underlying
! sequence as the original value of 'it': this satisfies application
! needs for "iterator copying".
!
! However, when itertools.tee can make no hypotheses about the nature of
! the iterator it is passed as an argument, it must save in memory all
! items through which one of the two 'teed' iterators, but not yet both,
! have stepped. This can be quite costly in terms of memory, if the two
! iterators get very far from each other in their stepping; indeed, in
! some cases it may be preferable to make a list from the iterator so as
! to be able to step repeatedly through the subsequence, or, if that is
! too costy in terms of memory, save items to disk, again in order to be
! able to iterate through them repeatedly.
!
! This PEP proposes another idea that will, in some important cases,
! allow itertools.tee to do its job with minimal cost in terms of
! memory; user code may also occasionally be able to exploit the idea in
! order to decide whether to copy an iterator, make a list from it, or
! use an auxiliary disk file.
!
! The key consideration is that some important iterators, such as those
! which built-in function iter builds over sequences, would be
! intrinsically easy to copy: just get another reference to the same
! sequence, and a copy of the integer index. However, in Python 2.3,
! those iterators don't expose the state, and don't support copy.copy.
!
! The purpose of this PEP, therefore, is to have those iterator types
! expose a suitable __copy__ method. Similarly, user-coded iterator
! types that can provide copies of their instances, suitable for
! separate and independent iteration, with limited costs in time and
! space, should also expose a suitable __copy__ method. While
! copy.copy also supports other ways to let a type control the way
! its instances are copied, it is suggested, for simplicity, that
! iterator types that support copying always do so by exposing a
! __copy__ method, and not in the other ways copy.copy supports.
!
! Having iterators expose a suitable __copy__ when feasible will afford
! easy optimization of itertools.tee and similar user code, as in:
def tee(it):
it = iter(it)
! try: copier = it.__copy__
! except AttributeError:
# non-copyable iterator, do all the needed hard work
+ # [snipped!]
+ else:
+ return it, copier()
+
+ Note that this function does NOT call "copy.copy(it)", which (even
+ after this PEP is implemented) might well still "just happen to
+ succeed". for some iterator type that is implemented as a user-coded
+ class. without really supplying an adequate "independently iterable"
+ copy object as its result.
***************
*** 56,64 ****
without arguments on any instance x of X. The method should be
exposed if and only if the iterator type can provide copyability with
! reasonably little computational and memory effort. Similarly, X may
! expose a method __deepcopy__ that is callable with one argument (a memo
! dictionary) on instances of X. The (very concise...) specs for
! these methods are at [2] -- for more details, see also file copy.py
! in the Python standard library.
For example, suppose a class Iter essentially duplicated the
--- 117,124 ----
without arguments on any instance x of X. The method should be
exposed if and only if the iterator type can provide copyability with
! reasonably little computational and memory effort. Furthermore, the
! new object y returned by method __copy__ should be a new instance
! of X that is iterable independently and separately from x, stepping
! along the same "underlying sequence" of items.
For example, suppose a class Iter essentially duplicated the
***************
*** 76,85 ****
def next(self):
try: result = self.sequence[self.index]
! except IndexError: raise StopIteration
self.index += 1
return result
To make this Iter class compliant with this PEP, the following
! additions to the body of class Iter would suffice:
def __copy__(self):
--- 136,145 ----
def next(self):
try: result = self.sequence[self.index]
! except IndexError: raise StopIteration
self.index += 1
return result
To make this Iter class compliant with this PEP, the following
! addition to the body of class Iter would suffice:
def __copy__(self):
***************
*** 88,157 ****
return result
! def __deepcopy__(self, memo):
! result = self.__copy__()
! result.sequence = deepcopy(self.sequence, memo)
! return result
! Details
! Besides adding to the Python docs a recommendation that user-coded
! iterators be made copyable when feasible, this PEP's implementation
! will specifically include the addition of copyability to the iterators
! over sequences that built-in iter returns, and also to the iterators
! over a dictionary returned by the methods __iter__, iterkeys, itervalues,
! and iteritems of built-in type dict.
! Iterators produced by generators will not be copyable (the BDFL deems
! shallow copy impossible, and deep copy too much trouble). However,
! iterators produced by the new "generator expressions" of Python 2.4
! (PEP 289 [3]) should be copyable if their underlying iterator[s] are;
! the strict limitations on what is possible in a generator expression,
! compared to the much vaster generality of a generator, should make
! that feasible. Similarly, the iterators produced by the built-in
! function enumerate should be copyable if the underlying iterator is.
! The implementation of this PEP will also include the optimization
! of the new itertools.tee function mentioned in the Motivation section.
! Rationale
! Being able to copy iterators will allow copying of user-coded classes
! that have copyable iterators as attributes. This applies to both
! shallow and deep copying.
! Deep copyability of suitable iterators will allow "decoupling" from
! an underlying mutable sequence, and, according to the BDFL, may in
! the future allow certain iterators to become picklable (however, this
! PEP, in itself, does not include picklability of iterators).
! The main use case for (shallow) copying of an iterator is the same
! as for the function itertools.tee (new in 2.4). Indeed, we assume
! that user code will typically not directly attempt to copy.copy an
! iterator, because it would have to deal with uncopyable cases; just
! calling itertools.tee will enable copying when feasible, with an
! implicit fallback to a maximally efficient non-copying strategy for
! iterators that are not copyable.
! A tee'd iterator may serve as a "reference point", allowing processing
! of a sequence to continue or resume from a known point, while the other
! independent iterator can be freely advanced to "explore" a further part
! of the sequence as needed. To fully exploit this pattern, it should
! ideally also be possible, given two iterators originally produced by
! iterator.tee, to check if they are "equal" (have been stepped the same
! number of times); however, this PEP, in itself, does not include
! comparability of such iterators. Therefore, code using this pattern
! may also need to count the number of forward steps taken after tee by
! the iterators, so as to be able to tell which one is "behind", and by
! how much. Built-in function enumerate is one way to make such counting
! very easy in many situations. However, such needs do not always arise.
! Here is a simpler example: a generator which, given an iterator of
! numbers (assumed to be positive), returns a corresponding iterator
! each of whose items is the fraction of the total corresponding to each
! corresponding item of the input iterator. The caller may pass the
! total as a value, if known in advance; otherwise, the iterator
! returned by calling the generator will first compute the total.
def fractions(numbers, total=None):
--- 148,266 ----
return result
! Note that __copy__, in this case, does not even try to copy the
! sequence; if the sequence is altered while either or both of the
! original and copied iterators are still stepping on it, the iteration
! behavior is quite likely to go awry anyway -- it is not __copy__'s
! responsibility to change this normal Python behavior for iterators
! which iterate on mutable sequences (that might, perhaps, be the
! specification for a __deepcopy__ method of iterators, which, however,
! this PEP does not deal with).
+ Consider also a "random iterator", which provides a nonterminating
+ sequence of results from some method of a random instance, called
+ with given arguments:
! class RandomIterator(object):
! def __init__(self, bound_method, *args):
! self.call = bound_method
! self.args = args
! def __iter__(self):
! return self
! def next(self):
! return self.call(*self.args)
!
! def __copy__(self):
! import copy, new
! im_self = copy.copy(self.call.im_self)
! method = new.instancemethod(self.call.im_func, im_self)
! return self.__class__(method, *self.args)
+ This iterator type is slightly more general than its name implies, as
+ it supports calls to any bound method (or other callable, but if the
+ callable is not a bound method, then method __copy__ will fail). But
+ the use case is for the purpose of generating random streams, as in:
! import random
! def show5(it):
! for i, result in enumerate(it):
! print '%6.3f'%result,
! if i==4: break
! print
! normit = RandomIterator(random.Random().gauss, 0, 1)
! show5(normit)
! copit = normit.__copy__()
! show5(normit)
! show5(copit)
! which will display some output such as:
! -0.536 1.936 -1.182 -1.690 -1.184
! 0.666 -0.701 1.214 0.348 1.373
! 0.666 -0.701 1.214 0.348 1.373
! the key point being that the second and third lines are equal, because
! the normit and copit iterators will step along the same "underlying
! sequence". (As an aside, note that to get a copy of self.call.im_self
! we must use copy.copy, NOT try getting at a __copy__ method directly,
! because for example instances of random.Random support copying via
! __getstate__ and __setstate__, NOT via __copy__; indeed, using
! copy.copy is the normal way to get a shallow copy of any object --
! copyable iterators are different because of the already-mentioned
! uncertainty about the result of copy.copy supporting these "copyable
! iterator" specs).
!
!
! Details
!
! Besides adding to the Python docs a recommendation that user-coded
! iterator types support a __copy__ method (if and only if it can be
! implemented with small costs in memory and runtime, and produce an
! independently-iterable copy of an iterator object), this PEP's
! implementation will specifically include the addition of copyability
! to the iterators over sequences that built-in iter returns, and also
! to the iterators over a dictionary returned by the methods __iter__,
! iterkeys, itervalues, and iteritems of built-in type dict.
!
! Iterators produced by generator functions will not be copyable.
! However, iterators produced by the new "generator expressions" of
! Python 2.4 (PEP 289 [3]) should be copyable if their underlying
! iterator[s] are; the strict limitations on what is possible in a
! generator expression, compared to the much vaster generality of a
! generator, should make that feasible. Similarly, the iterators
! produced by the built-in function enumerate, and certain functions
! suppiled by module itertools, should be copyable if the underlying
! iterators are.
!
! The implementation of this PEP will also include the optimization of
! the new itertools.tee function mentioned in the Motivation section.
!
!
! Rationale
!
! The main use case for (shallow) copying of an iterator is the same as
! for the function itertools.tee (new in 2.4). User code will not
! directly attempt to copy an iterator, because it would have to deal
! separately with uncopyable cases; calling itertools.tee will
! internally perform the copy when appropriate, and implicitly fallback
! to a maximally efficient non-copying strategy for iterators that are
! not copyable. (Occasionally, user code may want more direct control,
! specifically in order to deal with non-copyable iterators by other
! strategies, such as making a list or saving the sequence to disk).
!
! A tee'd iterator may serve as a "reference point", allowing processing
! of a sequence to continue or resume from a known point, while the
! other independent iterator can be freely advanced to "explore" a
! further part of the sequence as needed. A simple example: a generator
! function which, given an iterator of numbers (assumed to be positive),
! returns a corresponding iterator, each of whose items is the fraction
! of the total corresponding to each corresponding item of the input
! iterator. The caller may pass the total as a value, if known in
! advance; otherwise, the iterator returned by calling this generator
! function will first compute the total.
def fractions(numbers, total=None):
***************
*** 166,170 ****
--- 275,442 ----
precompute the total, if needed, without necessarily requiring
O(N) auxiliary memory if the numbers iterator is copyable.
+
+ As another example of "iterator bookmarking", consider a stream of
+ numbers with an occasional string as a "postfix operator" now and
+ then. By far most frequent such operator is a '+', whereupon we must
+ sum all previous numbers (since the last previous operator if any, or
+ else since the start) and yield the result. Sometimes we find a '*'
+ instead, which is the same except that the previous numbers must
+ instead be multiplied, not summed.
+
+ def filter_weird_stream(stream):
+ it = iter(stream)
+ while True:
+ it, bookmark = itertools.tee(it)
+ total = 0
+ for item in it:
+ if item=='+':
+ yield total
+ break
+ elif item=='*':
+ product = 1
+ for item in bookmark:
+ if item=='*':
+ yield product
+ break
+ else:
+ product *= item
+ else:
+ total += item
+
+ Similar use cases of itertools.tee can support such tasks as
+ "undo" on a stream of commands represented by an iterator,
+ "backtracking" on the parse of a stream of tokens, and so on.
+ (Of course, in each case, one should also consider simpler
+ possibilities such as saving relevant portions of the sequence
+ into lists while stepping on the sequence with just one iterator,
+ depending on the details of one's task).
+
+
+ Here is an example, in pure Python, of how the 'enumerate'
+ built-in could be extended to support __copy__ if its underlying
+ iterator also supported __copy__:
+
+ class enumerate(object):
+
+ def __init__(self, it):
+ self.it = iter(it)
+ self.i = -1
+
+ def __iter__(self):
+ return self
+
+ def next(self):
+ self.i += 1
+ return self.i, self.it.next()
+
+ def __copy__(self):
+ result = self.__class__.new()
+ result.it = self.it.__copy__()
+ result.i = self.i
+ return result
+
+
+ Here is an example of the kind of "fragility" produced by "accidental
+ copyability" of an iterator -- the reason why one must NOT use
+ copy.copy expecting, if it succeeds, to receive as a result an
+ iterator which is iterable-on independently from the original. Here
+ is an iterator class that iterates (in preorder) on "trees" which, for
+ simplicity, are just nested lists -- any item that's a list is treated
+ as a subtree, any other item as a leaf.
+
+ class ListreeIter(object):
+
+ def __init__(self, tree):
+ self.tree = [tree]
+ self.indx = [-1]
+
+ def __iter__(self):
+ return self
+
+ def next(self):
+ if not self.indx:
+ raise StopIteration
+ self.indx[-1] += 1
+ try:
+ result = self.tree[-1][self.indx[-1]]
+ except IndexError:
+ self.tree.pop()
+ self.indx.pop()
+ return self.next()
+ if type(result) is not list:
+ return result
+ self.tree.append(result)
+ self.indx.append(-1)
+ return self.next()
+
+ Now, for example, the following code:
+
+ import copy
+ x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]
+
+ print 'showing all items:',
+ it = ListreeIter(x)
+ for i in it:
+ print i,
+ if i==6: cop = copy.copy(it)
+ print
+
+ print 'showing items >6 again:'
+ for i in cop: print i,
+ print
+
+ does NOT work as intended -- the "cop" iterator gets consumed, and
+ exhausted, step by step as the original "it" iterator is, because
+ the accidental (rather than deliberate) copying performed by
+ copy.copy shares, rather than duplicating the "index" list, which
+ is the mutable attribute it.indx (a list of numerical indices).
+ Thus, this "client code" of the iterator, which attemps to iterate
+ twice over a portion of the sequence via a copy.copy on the
+ iterator, is NOT correct.
+
+ Some correct solutions include using itertools.tee, i.e., changing
+ the first for loop into:
+
+ for i in it:
+ print i,
+ if i==6:
+ it, cop = itertools.tee(it)
+ break
+ for i in it: print i,
+
+ (note that we MUST break the loop in two, otherwise we'd still
+ be looping on the ORIGINAL value of it, which must NOT be used
+ further after the call to tee!!!); or making a list, i.e.:
+
+ for i in it:
+ print i,
+ if i==6:
+ cop = lit = list(it)
+ break
+ for i in lit: print i,
+ (again, the loop must be broken in two, since iterator 'it'
+ gets exhausted by the call list(it)).
+
+ Finally, all of these solutions would work if Listiter supplied
+ a suitable __copy__ method, as this PEP recommends:
+
+ def __copy__(self):
+ result = self.__class__.new()
+ result.tree = copy.copy(self.tree)
+ result.indx = copy.copy(self.indx)
+ return result
+
+ There is no need to get any "deeper" in the copy, but the two
+ mutable "index state" attributes must indeed be copied in order
+ to achieve a "proper" (independently iterable) iterator-copy.
+
+ The recommended solution is to have class Listiter supply this
+ __copy__ method AND have client code use itertools.tee (with
+ the split-in-two-parts loop as shown above). This will make
+ client code maximally tolerant of different iterator types it
+ might be using AND achieve good performance for tee'ing of this
+ specific iterator type at the same time.
+
References