[Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib]

François Pinard pinard@iro.umontreal.ca
28 Aug 2002 12:12:41 -0400

[Guido van Rossum]

> > def cartesian(*sequences): [...]

> It occurred to me that this is rather ineffecient because it invokes
> itself recursively many time (once for each element in the first
> sequence).  This version is much faster, because iterating over a
> built-in sequence (like a list) is much faster than iterating over a
> generator:

Granted, thanks!  I postponed optimisations for the first draft of `cogen',
but if it looks acceptable overall, we can try to get some speed of it now.

> def cartesian(*sequences):
>     if len(sequences) == 0:
>         yield []
>     else:
>         head, tail = sequences[:-1], sequences[-1]
>         for x in cartesian(*head):
>             for y in tail:
>                     yield x + [y]

> I also wonder if perhaps ``tail = list(tail)'' should be inserted
> just before the for loop, so that the arguments may be iterators as
> well.

`cogen' does not make any special effort for protecting iterators given as
input sequences.  `cartesian' is surely not the only place where iterators
would create a problem.  Solving it as a special case for `cartesian'
only is not very nice.  Of course, we might transform all sequences into
lists everywhere in `cogen', but `list'-ing a list copies it, and I'm not
sure this would be such a good idea in the end.

Best may be to let the user explicitly transform input iterators into
lists by calling `list' explicitly those arguments.  That might be an
acceptable compromise.

> I expect that Eric Raymond's powerset is much faster than your recursive
> subsets() [...]

Very likely, yes.  I started `cogen' with algorithms looking a bit all
alike, and did not look at speed.  OK, I'll switch.  I do not have Eric's
algorithm handy, but if I remember well, it merely mapped successive
integers to subsets by associating each bit with an element.

François Pinard   http://www.iro.umontreal.ca/~pinard