[Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib]

Oren Tirosh oren-py-d@hishome.net
Wed, 28 Aug 2002 01:13:55 -0400

On Tue, Aug 27, 2002 at 09:26:00PM -0400, Guido van Rossum wrote:
> It occurred to me that this is rather ineffecient because it invokes
> itself recursively many time (once for each element in the first
> sequence).  This version is much faster, because iterating over a
> built-in sequence (like a list) is much faster than iterating over a
> generator:
> def cartesian(*sequences):
>     if len(sequences) == 0:
>         yield []
>     else:
>         head, tail = sequences[:-1], sequences[-1]
>         for x in cartesian(*head):
>             for y in tail:
>                     yield x + [y]

My implementation from http://www.tothink.com/python/dataflow/xlazy.py:

def xcartprod(arg1, *args):
    if not args:
        for x in arg1:
            yield (x,)
    elif len(args) == 1:
        arg2 = args[0]
        for x in arg1:
            for y in arg2:
                yield x, y
        for x in arg1:
            for y in xcartprod(args[0], *args[1:]):
                yield (x,) + y

Special-casing the 2 argument case helps a lot. It brings the performace
within 50% of nested loops which means that if you actually do something
inside the loop the overhead is quite negligible.

The 'x' prefix is shared with other functions in this module: a lazy xmap,
xzip and xfilter.

> I also wonder if perhaps ``tail = list(tail)'' should be inserted
> just before the for loop, so that the arguments may be iterators as
> well.

Ahh... re-iterability again...

This is a good example of a function that *fails silently* for non 
re-iterable arguments.

Slurping the tail into a list loses the lazy efficiency of this function. 
One of the ways I've used this function is to scan combinations until a 
condition is satisfied. The iteration is always terminated before reaching 
the end. Reading ahead may waste computation and memory.

All I want is something that will raise an exception if any argument but 
the first is not re-iterable (e.g. my reiter() proposal). I'll add list()
to the argument myself if I really want to. Don't try to guess what I