[Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib]

Guido van Rossum guido@python.org
Wed, 28 Aug 2002 13:58:18 -0400


> Granted, thanks!  I postponed optimisations for the first draft of `cogen',
> but if it looks acceptable overall, we can try to get some speed of it now.

I'm not saying that it looks good overall -- I'd like to defer to Tim,
who has used and written this kind of utities for real, and who
probably has a lot of useful feedback.  Right now, he's felled by some
kind of ilness though.

> > def cartesian(*sequences):
> >     if len(sequences) == 0:
> >         yield []
> >     else:
> >         head, tail = sequences[:-1], sequences[-1]
> >         for x in cartesian(*head):
> >             for y in tail:
> >                     yield x + [y]
> 
> > I also wonder if perhaps ``tail = list(tail)'' should be inserted
> > just before the for loop, so that the arguments may be iterators as
> > well.
> 
> `cogen' does not make any special effort for protecting iterators
> given as input sequences.  `cartesian' is surely not the only place
> where iterators would create a problem.  Solving it as a special
> case for `cartesian' only is not very nice.  Of course, we might
> transform all sequences into lists everywhere in `cogen', but
> `list'-ing a list copies it, and I'm not sure this would be such a
> good idea in the end.

Hm.  All but the last will be iterated over many times.  In practice
the inputs will be relatively small (I can't imagine using this for
sequences with 100s of 1000s elements).  Or you might sniff the type
and avoid the copy if you know it's a list or tuple.  Or you might use
Oren's favorite rule of thumb and listify it when iter(x) is iter(x)
(or iter(x) is x).

> Best may be to let the user explicitly transform input iterators into
> lists by calling `list' explicitly those arguments.  That might be an
> acceptable compromise.

Maybe.

> > I expect that Eric Raymond's powerset is much faster than your recursive
> > subsets() [...]
> 
> Very likely, yes.  I started `cogen' with algorithms looking a bit all
> alike, and did not look at speed.  OK, I'll switch.  I do not have Eric's
> algorithm handy, but if I remember well, it merely mapped successive
> integers to subsets by associating each bit with an element.

def powerset(base):
    """Powerset of an iterable, yielding lists."""
    pairs = [(2**i, x) for i, x in enumerate(base)]
    for n in xrange(2**len(pairs)):
        yield [x for m, x in pairs if m&n]

--Guido van Rossum (home page: http://www.python.org/~guido/)