[Python-Dev] RE: [Python-checkins] python/nondist/pepspep-0323.txt, NONE, 1.1 pep-0000.txt, 1.254, 1.255

Alex Martelli aleaxit at yahoo.com
Tue Oct 28 13:24:54 EST 2003

On Tuesday 28 October 2003 06:09 pm, Raymond Hettinger wrote:
> [Alex]
> > Btw, random.py doesn't seem to supply pseudo-random iterators --
> > easy to make e.g. with iter(random.random, None) [assuming you
> > want a nonterminating one],
> Probably a bit faster with:
> 	starmap(random.random, repeat(()))

Yep, saving the useless compare does help:

[alex at lancelot bo]$ timeit.py -c -s'import random' -s'import itertools as it' 
> -s'xx=it.starmap(random.random, it.repeat(()))' 'xx.next()'
1000000 loops, best of 3: 1.37 usec per loop

[alex at lancelot bo]$ timeit.py -c -s'import random' -s'import itertools as it'     
-s'xx=iter(random.random, None)' 'xx.next()'
1000000 loops, best of 3: 1.62 usec per loop

Renewed compliments for itertools' speed!-)

> > but that wouldn't be copyable.  Should
> > something be done about that...?
> No.
> 1) The use case is not typical.  Other than random.random() and
> time.ctime(), it is rare to see functions of zero arguments that
> usefully return an infinite sequence of distinct values.

Sorry, I must have been unclear -- I had no "zero arguments"
limitation in mind.  Rather, I thought of something like:

it = random.iterator(random.Random().sample, range(8), 3)

now each call to it.next() would return a random sample of 3
numbers from range(8) w/o repetition.  I.e., the usual
    <makemeacallbackobject>(callable, *args)
idiom (as in Tkinter widgets' .after method, etc, etc).  What I'm
saying is that there is no reason type(it) shouldn't support a
__copy__ method -- as long as the underlying callable sports
an im_self which also exposes a __getstate__ method, at least.

Come to think of this, there may be other use cases for this
general approach than "random iterators".  Do you think that
an iterator on a callable *and args for it* would live well in
itertools?  That module IS, after all, your baby...

> 2) If you need a copy, run it through tee().

That's exactly what I plan to do -- but I would NOT want tee()
to consume O(N) memory [where N is how far out of step the
two iterators may get] in those cases where the iterator
argument DOES have a __copy__ method that can presumably
produce a usable copy with O(1) memory expenditure.  Thus,
I'd like itertools.tee to start by checking if its argument iterator
"is properly copyable".

Guido has pointed out that it would not be safe to just try
copy.copy(it), because that MIGHT produce a copy that does
not satisfy "iterator copying" semantics requirements.  As
an example, he has repeatedly mentioned "an iterator on a
tree which keeps ``a stack of indices''".

Here, I think, is an indication of the kind of thing he fears
(code untested beyond running it on that one example):

import copy

class TreeIter(object):
    def __init__(self, tree):
        self.tree = [tree]
        self.indx = [-1]
    def __iter__(self):
        return self
    def next(self):
        if not self.indx: raise StopIteration
        self.indx[-1] += 1
            result = self.tree[-1][self.indx[-1]]
        except IndexError:
            if not self.indx: raise StopIteration
            return self.next()
        if type(result) is not list: return result
        return self.next()

x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]

print 'all items, one:',
for i in TreeIter(x): print i,

print 'all items, two:',
it = TreeIter(x)
for i in it:
    print i,
    if i==6: cop = copy.copy(it)

print '>=6 items, one:',
for i in cop: print i,

print '>=6 items, two:',
it = TreeIter(x)
for i in it:
    if i==6: cop = copy.deepcopy(it)
for i in cop: print i,

Output is:

[alex at lancelot bo]$ python treit.py
all items, one: 1 2 3 4 5 6 7 8 9 10 11 12
all items, two: 1 2 3 4 5 6 7 8 9 10 11 12
>=6 items, one:
>=6 items, two: 7 8 9 10 11 12

i.e., the "iterator copy" returned by copy.copy does NOT satisfy
requirements!  (I've added the last tidbit to show that the one
returned by copy.deepcopy WOULD satisfy them, but, it's clearly
WAY too memory-costly to consider, far worse than tee()!!!!).

So, "safely copying an iterator" means ensuring the iterator's
author HAS thought specifically about allowing a copy -- in
which case, we can (well, we _must_:-) trust that they have
implemented things correctly.  Just using copy.copy(it) MIGHT
fall afoul of a default shallow copy not being sufficient.

Perhaps we can get by with checking for, and using if found,
a __copy__ method only.  Is there a specific need to support
__setstate__ etc, here?  I hope not -- still, these, too, are
things I must make sure are mentioned in the PEP.


More information about the Python-Dev mailing list