python/nondist/peps pep-0323.txt,1.1,1.2
![](https://secure.gravatar.com/avatar/558c7d1982e92f48d2038bbbe5948f42.jpg?s=120&d=mm&r=g)
Update of /cvsroot/python/python/nondist/peps In directory sc8-pr-cvs1:/tmp/cvs-serv32477 Modified Files: pep-0323.txt Log Message: Rewritten, with removal of deep-copying ideas, but otherwise many additions of details and examples, as per feedback received on python-dev. Index: pep-0323.txt =================================================================== RCS file: /cvsroot/python/python/nondist/peps/pep-0323.txt,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** pep-0323.txt 27 Oct 2003 15:01:08 -0000 1.1 --- pep-0323.txt 29 Oct 2003 13:39:51 -0000 1.2 *************** *** 9,52 **** Created: 25-Oct-2003 Python-Version: 2.4 ! Post-History: Abstract ! This PEP suggests that some iterator types should support being copied ! (shallowly or deeply), by exposing suitable methods such as __copy__ or ! __deepcopy__. Motivation ! Some iterators, such as those which built-in function iter builds over ! sequences, should be intrinsically easy to copy (just get another ! reference to the same sequence and a copy of the index) or deepcopy ! (ditto but with a deepcopy of the underlying sequence). ! However, in Python up to 2.3, those iterators don't expose their state: ! user code which is using such an iterator X is not able to just "save" a ! copy of X (for future iteration from the current point) with a simple ! "saved_X = copy.copy(X)". ! To "tee" a non-copyable iterator into two independently iterable ones, ! subtle code [such as that which will go into itertools.tee in 2.4] is ! needed (a Python version thereof is currently in the itertools docs). ! In any case, such code will inevitably have to consume memory to keep ! a copy of the subsequence which has been iterated on by one but not ! both tee'd iterators. This is a waste of memory in those cases in ! which the iterator being tee'd already _has_ an underlying sequence ! (or other iterable) and index. ! Having iterators expose a suitable __copy__ when feasible allows easy ! optimization of itertools.tee and similar user code, as in: def tee(it): it = iter(it) ! try: return it, copy.copy(it) ! except TypeError: # non-copyable iterator, do all the needed hard work --- 9,113 ---- Created: 25-Oct-2003 Python-Version: 2.4 ! Post-History: 29-Oct-2003 Abstract ! This PEP suggests that some iterator types should support shallow ! copies of their instances by exposing a __copy__ method which meets ! some specific requirements, and indicates how code using an iterator ! might exploit such a __copy__ method when present. Motivation ! In Python up to 2.3, most built-in iterator types don't let the user ! copy their instances. User-coded iterators that do let their clients ! call copy.copy on their instances may, or may not, happen to return, ! as a result of the copy, a separate iterator object that may be ! iterated upon independently from the original. ! Currently, "support" for copy.copy in a user-coded iterator type is ! almost invariably "accidental" -- i.e., the standard machinery of the ! copy method in Python's standard library's copy module does build and ! return a copy. However, the copy will be independently iterable with ! respect to the original only if calling .next() on an instance of that ! class happens to change instance state solely by rebinding some ! attributes to new values, and not by mutating some attributes' ! existing values. ! For example, an iterator whose "index" state is held as an integer ! attribute will probably give usable copies, since (integers being ! immutable) .next() presumably just rebinds that attribute. On the ! other hand, another iterator whose "index" state is held as a list ! attribute will probably mutate the same list object when .next() ! executes, and therefore copies of such an iterator will not be ! iterable separately and independently from the original. ! Given this existing situation, copy.copy(it) on some iterator object ! isn't very useful, nor, therefore, is it at all widely used. However, ! there are many cases in which being able to get a "snapshot" of an ! iterator, as a "bookmark", so as to be able to keep iterating along ! the sequence but later iterate again on the same sequence from the ! bookmark onwards, is useful. To support such "bookmarking", module ! itertools, in 2.4, has grown a 'tee' function, to be used as: ! it, bookmark = itertools.tee(it) ! ! The previous value of 'it' must not be used again, which is why this ! typical usage idiom rebinds the name. After this call, 'it' and ! 'bookmark' are independently-iterable iterators on the same underlying ! sequence as the original value of 'it': this satisfies application ! needs for "iterator copying". ! ! However, when itertools.tee can make no hypotheses about the nature of ! the iterator it is passed as an argument, it must save in memory all ! items through which one of the two 'teed' iterators, but not yet both, ! have stepped. This can be quite costly in terms of memory, if the two ! iterators get very far from each other in their stepping; indeed, in ! some cases it may be preferable to make a list from the iterator so as ! to be able to step repeatedly through the subsequence, or, if that is ! too costy in terms of memory, save items to disk, again in order to be ! able to iterate through them repeatedly. ! ! This PEP proposes another idea that will, in some important cases, ! allow itertools.tee to do its job with minimal cost in terms of ! memory; user code may also occasionally be able to exploit the idea in ! order to decide whether to copy an iterator, make a list from it, or ! use an auxiliary disk file. ! ! The key consideration is that some important iterators, such as those ! which built-in function iter builds over sequences, would be ! intrinsically easy to copy: just get another reference to the same ! sequence, and a copy of the integer index. However, in Python 2.3, ! those iterators don't expose the state, and don't support copy.copy. ! ! The purpose of this PEP, therefore, is to have those iterator types ! expose a suitable __copy__ method. Similarly, user-coded iterator ! types that can provide copies of their instances, suitable for ! separate and independent iteration, with limited costs in time and ! space, should also expose a suitable __copy__ method. While ! copy.copy also supports other ways to let a type control the way ! its instances are copied, it is suggested, for simplicity, that ! iterator types that support copying always do so by exposing a ! __copy__ method, and not in the other ways copy.copy supports. ! ! Having iterators expose a suitable __copy__ when feasible will afford ! easy optimization of itertools.tee and similar user code, as in: def tee(it): it = iter(it) ! try: copier = it.__copy__ ! except AttributeError: # non-copyable iterator, do all the needed hard work + # [snipped!] + else: + return it, copier() + + Note that this function does NOT call "copy.copy(it)", which (even + after this PEP is implemented) might well still "just happen to + succeed". for some iterator type that is implemented as a user-coded + class. without really supplying an adequate "independently iterable" + copy object as its result. *************** *** 56,64 **** without arguments on any instance x of X. The method should be exposed if and only if the iterator type can provide copyability with ! reasonably little computational and memory effort. Similarly, X may ! expose a method __deepcopy__ that is callable with one argument (a memo ! dictionary) on instances of X. The (very concise...) specs for ! these methods are at [2] -- for more details, see also file copy.py ! in the Python standard library. For example, suppose a class Iter essentially duplicated the --- 117,124 ---- without arguments on any instance x of X. The method should be exposed if and only if the iterator type can provide copyability with ! reasonably little computational and memory effort. Furthermore, the ! new object y returned by method __copy__ should be a new instance ! of X that is iterable independently and separately from x, stepping ! along the same "underlying sequence" of items. For example, suppose a class Iter essentially duplicated the *************** *** 76,85 **** def next(self): try: result = self.sequence[self.index] ! except IndexError: raise StopIteration self.index += 1 return result To make this Iter class compliant with this PEP, the following ! additions to the body of class Iter would suffice: def __copy__(self): --- 136,145 ---- def next(self): try: result = self.sequence[self.index] ! except IndexError: raise StopIteration self.index += 1 return result To make this Iter class compliant with this PEP, the following ! addition to the body of class Iter would suffice: def __copy__(self): *************** *** 88,157 **** return result ! def __deepcopy__(self, memo): ! result = self.__copy__() ! result.sequence = deepcopy(self.sequence, memo) ! return result ! Details ! Besides adding to the Python docs a recommendation that user-coded ! iterators be made copyable when feasible, this PEP's implementation ! will specifically include the addition of copyability to the iterators ! over sequences that built-in iter returns, and also to the iterators ! over a dictionary returned by the methods __iter__, iterkeys, itervalues, ! and iteritems of built-in type dict. ! Iterators produced by generators will not be copyable (the BDFL deems ! shallow copy impossible, and deep copy too much trouble). However, ! iterators produced by the new "generator expressions" of Python 2.4 ! (PEP 289 [3]) should be copyable if their underlying iterator[s] are; ! the strict limitations on what is possible in a generator expression, ! compared to the much vaster generality of a generator, should make ! that feasible. Similarly, the iterators produced by the built-in ! function enumerate should be copyable if the underlying iterator is. ! The implementation of this PEP will also include the optimization ! of the new itertools.tee function mentioned in the Motivation section. ! Rationale ! Being able to copy iterators will allow copying of user-coded classes ! that have copyable iterators as attributes. This applies to both ! shallow and deep copying. ! Deep copyability of suitable iterators will allow "decoupling" from ! an underlying mutable sequence, and, according to the BDFL, may in ! the future allow certain iterators to become picklable (however, this ! PEP, in itself, does not include picklability of iterators). ! The main use case for (shallow) copying of an iterator is the same ! as for the function itertools.tee (new in 2.4). Indeed, we assume ! that user code will typically not directly attempt to copy.copy an ! iterator, because it would have to deal with uncopyable cases; just ! calling itertools.tee will enable copying when feasible, with an ! implicit fallback to a maximally efficient non-copying strategy for ! iterators that are not copyable. ! A tee'd iterator may serve as a "reference point", allowing processing ! of a sequence to continue or resume from a known point, while the other ! independent iterator can be freely advanced to "explore" a further part ! of the sequence as needed. To fully exploit this pattern, it should ! ideally also be possible, given two iterators originally produced by ! iterator.tee, to check if they are "equal" (have been stepped the same ! number of times); however, this PEP, in itself, does not include ! comparability of such iterators. Therefore, code using this pattern ! may also need to count the number of forward steps taken after tee by ! the iterators, so as to be able to tell which one is "behind", and by ! how much. Built-in function enumerate is one way to make such counting ! very easy in many situations. However, such needs do not always arise. ! Here is a simpler example: a generator which, given an iterator of ! numbers (assumed to be positive), returns a corresponding iterator ! each of whose items is the fraction of the total corresponding to each ! corresponding item of the input iterator. The caller may pass the ! total as a value, if known in advance; otherwise, the iterator ! returned by calling the generator will first compute the total. def fractions(numbers, total=None): --- 148,266 ---- return result ! Note that __copy__, in this case, does not even try to copy the ! sequence; if the sequence is altered while either or both of the ! original and copied iterators are still stepping on it, the iteration ! behavior is quite likely to go awry anyway -- it is not __copy__'s ! responsibility to change this normal Python behavior for iterators ! which iterate on mutable sequences (that might, perhaps, be the ! specification for a __deepcopy__ method of iterators, which, however, ! this PEP does not deal with). + Consider also a "random iterator", which provides a nonterminating + sequence of results from some method of a random instance, called + with given arguments: ! class RandomIterator(object): ! def __init__(self, bound_method, *args): ! self.call = bound_method ! self.args = args ! def __iter__(self): ! return self ! def next(self): ! return self.call(*self.args) ! ! def __copy__(self): ! import copy, new ! im_self = copy.copy(self.call.im_self) ! method = new.instancemethod(self.call.im_func, im_self) ! return self.__class__(method, *self.args) + This iterator type is slightly more general than its name implies, as + it supports calls to any bound method (or other callable, but if the + callable is not a bound method, then method __copy__ will fail). But + the use case is for the purpose of generating random streams, as in: ! import random ! def show5(it): ! for i, result in enumerate(it): ! print '%6.3f'%result, ! if i==4: break ! print ! normit = RandomIterator(random.Random().gauss, 0, 1) ! show5(normit) ! copit = normit.__copy__() ! show5(normit) ! show5(copit) ! which will display some output such as: ! -0.536 1.936 -1.182 -1.690 -1.184 ! 0.666 -0.701 1.214 0.348 1.373 ! 0.666 -0.701 1.214 0.348 1.373 ! the key point being that the second and third lines are equal, because ! the normit and copit iterators will step along the same "underlying ! sequence". (As an aside, note that to get a copy of self.call.im_self ! we must use copy.copy, NOT try getting at a __copy__ method directly, ! because for example instances of random.Random support copying via ! __getstate__ and __setstate__, NOT via __copy__; indeed, using ! copy.copy is the normal way to get a shallow copy of any object -- ! copyable iterators are different because of the already-mentioned ! uncertainty about the result of copy.copy supporting these "copyable ! iterator" specs). ! ! ! Details ! ! Besides adding to the Python docs a recommendation that user-coded ! iterator types support a __copy__ method (if and only if it can be ! implemented with small costs in memory and runtime, and produce an ! independently-iterable copy of an iterator object), this PEP's ! implementation will specifically include the addition of copyability ! to the iterators over sequences that built-in iter returns, and also ! to the iterators over a dictionary returned by the methods __iter__, ! iterkeys, itervalues, and iteritems of built-in type dict. ! ! Iterators produced by generator functions will not be copyable. ! However, iterators produced by the new "generator expressions" of ! Python 2.4 (PEP 289 [3]) should be copyable if their underlying ! iterator[s] are; the strict limitations on what is possible in a ! generator expression, compared to the much vaster generality of a ! generator, should make that feasible. Similarly, the iterators ! produced by the built-in function enumerate, and certain functions ! suppiled by module itertools, should be copyable if the underlying ! iterators are. ! ! The implementation of this PEP will also include the optimization of ! the new itertools.tee function mentioned in the Motivation section. ! ! ! Rationale ! ! The main use case for (shallow) copying of an iterator is the same as ! for the function itertools.tee (new in 2.4). User code will not ! directly attempt to copy an iterator, because it would have to deal ! separately with uncopyable cases; calling itertools.tee will ! internally perform the copy when appropriate, and implicitly fallback ! to a maximally efficient non-copying strategy for iterators that are ! not copyable. (Occasionally, user code may want more direct control, ! specifically in order to deal with non-copyable iterators by other ! strategies, such as making a list or saving the sequence to disk). ! ! A tee'd iterator may serve as a "reference point", allowing processing ! of a sequence to continue or resume from a known point, while the ! other independent iterator can be freely advanced to "explore" a ! further part of the sequence as needed. A simple example: a generator ! function which, given an iterator of numbers (assumed to be positive), ! returns a corresponding iterator, each of whose items is the fraction ! of the total corresponding to each corresponding item of the input ! iterator. The caller may pass the total as a value, if known in ! advance; otherwise, the iterator returned by calling this generator ! function will first compute the total. def fractions(numbers, total=None): *************** *** 166,170 **** --- 275,442 ---- precompute the total, if needed, without necessarily requiring O(N) auxiliary memory if the numbers iterator is copyable. + + As another example of "iterator bookmarking", consider a stream of + numbers with an occasional string as a "postfix operator" now and + then. By far most frequent such operator is a '+', whereupon we must + sum all previous numbers (since the last previous operator if any, or + else since the start) and yield the result. Sometimes we find a '*' + instead, which is the same except that the previous numbers must + instead be multiplied, not summed. + + def filter_weird_stream(stream): + it = iter(stream) + while True: + it, bookmark = itertools.tee(it) + total = 0 + for item in it: + if item=='+': + yield total + break + elif item=='*': + product = 1 + for item in bookmark: + if item=='*': + yield product + break + else: + product *= item + else: + total += item + + Similar use cases of itertools.tee can support such tasks as + "undo" on a stream of commands represented by an iterator, + "backtracking" on the parse of a stream of tokens, and so on. + (Of course, in each case, one should also consider simpler + possibilities such as saving relevant portions of the sequence + into lists while stepping on the sequence with just one iterator, + depending on the details of one's task). + + + Here is an example, in pure Python, of how the 'enumerate' + built-in could be extended to support __copy__ if its underlying + iterator also supported __copy__: + + class enumerate(object): + + def __init__(self, it): + self.it = iter(it) + self.i = -1 + + def __iter__(self): + return self + + def next(self): + self.i += 1 + return self.i, self.it.next() + + def __copy__(self): + result = self.__class__.new() + result.it = self.it.__copy__() + result.i = self.i + return result + + + Here is an example of the kind of "fragility" produced by "accidental + copyability" of an iterator -- the reason why one must NOT use + copy.copy expecting, if it succeeds, to receive as a result an + iterator which is iterable-on independently from the original. Here + is an iterator class that iterates (in preorder) on "trees" which, for + simplicity, are just nested lists -- any item that's a list is treated + as a subtree, any other item as a leaf. + + class ListreeIter(object): + + def __init__(self, tree): + self.tree = [tree] + self.indx = [-1] + + def __iter__(self): + return self + + def next(self): + if not self.indx: + raise StopIteration + self.indx[-1] += 1 + try: + result = self.tree[-1][self.indx[-1]] + except IndexError: + self.tree.pop() + self.indx.pop() + return self.next() + if type(result) is not list: + return result + self.tree.append(result) + self.indx.append(-1) + return self.next() + + Now, for example, the following code: + + import copy + x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ] + + print 'showing all items:', + it = ListreeIter(x) + for i in it: + print i, + if i==6: cop = copy.copy(it) + print + + print 'showing items >6 again:' + for i in cop: print i, + print + + does NOT work as intended -- the "cop" iterator gets consumed, and + exhausted, step by step as the original "it" iterator is, because + the accidental (rather than deliberate) copying performed by + copy.copy shares, rather than duplicating the "index" list, which + is the mutable attribute it.indx (a list of numerical indices). + Thus, this "client code" of the iterator, which attemps to iterate + twice over a portion of the sequence via a copy.copy on the + iterator, is NOT correct. + + Some correct solutions include using itertools.tee, i.e., changing + the first for loop into: + + for i in it: + print i, + if i==6: + it, cop = itertools.tee(it) + break + for i in it: print i, + + (note that we MUST break the loop in two, otherwise we'd still + be looping on the ORIGINAL value of it, which must NOT be used + further after the call to tee!!!); or making a list, i.e.: + + for i in it: + print i, + if i==6: + cop = lit = list(it) + break + for i in lit: print i, + (again, the loop must be broken in two, since iterator 'it' + gets exhausted by the call list(it)). + + Finally, all of these solutions would work if Listiter supplied + a suitable __copy__ method, as this PEP recommends: + + def __copy__(self): + result = self.__class__.new() + result.tree = copy.copy(self.tree) + result.indx = copy.copy(self.indx) + return result + + There is no need to get any "deeper" in the copy, but the two + mutable "index state" attributes must indeed be copied in order + to achieve a "proper" (independently iterable) iterator-copy. + + The recommended solution is to have class Listiter supply this + __copy__ method AND have client code use itertools.tee (with + the split-in-two-parts loop as shown above). This will make + client code maximally tolerant of different iterator types it + might be using AND achieve good performance for tee'ing of this + specific iterator type at the same time. + References
participants (1)
-
aleax@users.sourceforge.net