[Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib]

Guido van Rossum guido@python.org
Wed, 28 Aug 2002 15:41:00 -0400


> Of course, even statisticians cannot afford all subsets of sets having
> 100 elements, or will not scan permutations of 1000 elements.  But 100s
> or 1000s elements are well within the bounds of cartesian products.

Yes, and there the cost of an extra list() copy is neglegeable
(allocate and copy 4K bytes).

> > Or you might use Oren's favorite rule of thumb and listify it when iter(x)
> > is iter(x) (or iter(x) is x).
> 
> I'm a bit annoyed by the idea that `iter(x)' might require some computation
> for producing an iterator, and that we immediately throw away the result.
> Granted that `__iter__(self): return self' is efficient when an object is
> an iterator, but nowhere it is said that `__iter__' has to be efficient
> when the object is a container, and it does not shock me that some complex
> containers require time to produce their iterator.  I much prefer limiting
> the use of `__iter__' for when one intends to use the iterator...

Yes, that's why I prefer to just make a list() copy.

> > def powerset(base):
> >     """Powerset of an iterable, yielding lists."""
> >     pairs = [(2**i, x) for i, x in enumerate(base)]
> >     for n in xrange(2**len(pairs)):
> >         yield [x for m, x in pairs if m&n]
> 
> Thanks!  Hmph!  This does not yield the subsets in "sorted order", like
> the other `cogen' methods do, and I would prefer to keep that promise.

That may be a matter of permuting the bits?

> Hopefully, the optimisation I added this morning will make both algorithms
> more comparable, speed-wise.  I should benchmark them to see.

Yes!  Nothing beats a benchmark as an eye-opener.

--Guido van Rossum (home page: http://www.python.org/~guido/)