[Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib]

François Pinard pinard@iro.umontreal.ca
28 Aug 2002 15:34:11 -0400

[Guido van Rossum]

> Right now, [Tim] is felled by some kind of illness though.

For the mere crumbs falling off of his project table, he seems to be working
like hell, so no surprise his body asks him to slow down once in a while.

I wish he will soon be healthy and happy again!  Life is not the same when
he is away...

> All but the last will be iterated over many times.  In practice the inputs
> will be relatively small (I can't imagine using this for sequences with
> 100s of 1000s elements).

Do not under-estimate statisticians!  They are well known for burning oodles
and oodles of computer cycles, contemplating or combining data sets which
are not always small, in all ways imaginable.

Of course, even statisticians cannot afford all subsets of sets having
100 elements, or will not scan permutations of 1000 elements.  But 100s
or 1000s elements are well within the bounds of cartesian products.

> Or you might use Oren's favorite rule of thumb and listify it when iter(x)
> is iter(x) (or iter(x) is x).

I'm a bit annoyed by the idea that `iter(x)' might require some computation
for producing an iterator, and that we immediately throw away the result.
Granted that `__iter__(self): return self' is efficient when an object is
an iterator, but nowhere it is said that `__iter__' has to be efficient
when the object is a container, and it does not shock me that some complex
containers require time to produce their iterator.  I much prefer limiting
the use of `__iter__' for when one intends to use the iterator...

> def powerset(base):
>     """Powerset of an iterable, yielding lists."""
>     pairs = [(2**i, x) for i, x in enumerate(base)]
>     for n in xrange(2**len(pairs)):
>         yield [x for m, x in pairs if m&n]

Thanks!  Hmph!  This does not yield the subsets in "sorted order", like
the other `cogen' methods do, and I would prefer to keep that promise.
Hopefully, the optimisation I added this morning will make both algorithms
more comparable, speed-wise.  I should benchmark them to see.

François Pinard   http://www.iro.umontreal.ca/~pinard