[Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib]
Oren Tirosh
oren-py-d@hishome.net
Wed, 28 Aug 2002 01:13:55 -0400
On Tue, Aug 27, 2002 at 09:26:00PM -0400, Guido van Rossum wrote:
> It occurred to me that this is rather ineffecient because it invokes
> itself recursively many time (once for each element in the first
> sequence). This version is much faster, because iterating over a
> built-in sequence (like a list) is much faster than iterating over a
> generator:
>
> def cartesian(*sequences):
> if len(sequences) == 0:
> yield []
> else:
> head, tail = sequences[:-1], sequences[-1]
> for x in cartesian(*head):
> for y in tail:
> yield x + [y]
My implementation from http://www.tothink.com/python/dataflow/xlazy.py:
def xcartprod(arg1, *args):
if not args:
for x in arg1:
yield (x,)
elif len(args) == 1:
arg2 = args[0]
for x in arg1:
for y in arg2:
yield x, y
else:
for x in arg1:
for y in xcartprod(args[0], *args[1:]):
yield (x,) + y
Special-casing the 2 argument case helps a lot. It brings the performace
within 50% of nested loops which means that if you actually do something
inside the loop the overhead is quite negligible.
The 'x' prefix is shared with other functions in this module: a lazy xmap,
xzip and xfilter.
> I also wonder if perhaps ``tail = list(tail)'' should be inserted
> just before the for loop, so that the arguments may be iterators as
> well.
Ahh... re-iterability again...
This is a good example of a function that *fails silently* for non
re-iterable arguments.
Slurping the tail into a list loses the lazy efficiency of this function.
One of the ways I've used this function is to scan combinations until a
condition is satisfied. The iteration is always terminated before reaching
the end. Reading ahead may waste computation and memory.
All I want is something that will raise an exception if any argument but
the first is not re-iterable (e.g. my reiter() proposal). I'll add list()
to the argument myself if I really want to. Don't try to guess what I
meant.
Oren