
On Fri, May 23, 2008 at 3:15 AM, Carl Johnson <carl@carlsensei.com> wrote:
This is based off of http://boredzo.org/blog/archives/2007-10-22/generating-all-combinations . It's also discussed at http://reddit.com/info/5ywrs/comments/ and someone with a lot of spare time can read Knuth's fascile on the subject at http://www-cs-faculty.stanford.edu/~knuth/fasc2a.ps.gz .
OK, suppose you have a situation where you need to loop through all combinations of 3 lists. The simple solution is to write three nested for-loops:
for x in xs: for y in ys: for z in zs: #Do something
Of course, this is obvious O(n^3), which is bad, but sometimes you don't have a choice. However, what if (to use an example I ran into), you're making up a truth table. If you know you have three variables, you might write:
for a in [True, False]: for b in [True, False]: for c in [True, False]: print a, b, c
<snip>
But if you don't know how many variables you'll have, you're stuck writing a complicated function instead of using a nice, simple for-loop.
So, going back to the first example, wouldn't it be nicer to write:
for x, y, z in combine(xs, ys, zs): #Do something
<snip>
So, I think a "combine" function (or some other, better name) would be a good thing to have, at least in the itertools, if not as a built-in function. How to implement it? Well, probably it should be done in C for efficiency, but of the versions done http://boredzo.org/blog/archives/2007-10-22/generating-all-combinations the one that I like best is
<snip>
However, I'm also convinced that there is a more efficient way to do this.
So, what do you think? Is this a common enough need that it should be built into itertools? The main namespace? Or should we leave it out, since adding it would encourage people writing O(n^x) algorithms? If nothing else, list members can enjoy rewriting this function for fun.
-- Carl Johnson
I really like this function, and I could have used it about a year ago. I actually like the name "combinations" better than "combine". IMO, "combinations" implies the intended meaning more directly, while "combine" could be combining the lists in some other way. Whatever the name, I think it would fit quite nicely into itertools. I don't think efficiency should be considered as a reason to leave it out, because as you said, there are simply cases (any time you need to enumerate all possible values) where you can't get around it. Besides, it's not too difficult to create quadratic or cubic lists using list comprehensions... the syntax is actually designed to support it. As for a stab at a good implementation: def combinations (*args): def combiner (seqs): try: seq0 = seqs.next() except StopIteration: yield () else: for prefix in combiner(iter(seqs)): for x in seq0: yield prefix + (x,) return combiner(reversed(args)) I tried to avoid unpacking and repacking lists, I returned the combinations as tuples, and I tried to use generators wherever possible. I didn't actually do any performance tests, though. Brandon