
This is based off of http://boredzo.org/blog/archives/2007-10-22/generating-all-combinations . It's also discussed at http://reddit.com/info/5ywrs/comments/ and someone with a lot of spare time can read Knuth's fascile on the subject at http://www-cs-faculty.stanford.edu/~knuth/fasc2a.ps.gz . OK, suppose you have a situation where you need to loop through all combinations of 3 lists. The simple solution is to write three nested for-loops: for x in xs: for y in ys: for z in zs: #Do something Of course, this is obvious O(n^3), which is bad, but sometimes you don't have a choice. However, what if (to use an example I ran into), you're making up a truth table. If you know you have three variables, you might write: for a in [True, False]: for b in [True, False]: for c in [True, False]: print a, b, c Which yields True True True True True False True False True True False False False True True False True False False False True False False False But if you don't know how many variables you'll have, you're stuck writing a complicated function instead of using a nice, simple for-loop. So, going back to the first example, wouldn't it be nicer to write: for x, y, z in combine(xs, ys, zs): #Do something If nothing else, this has the advantage that if you don't have to nest the for-loops, things don't end up being so deeply indented on the screen. Similarly, a general truth table maker can be written: ts_and_fs = [[True, False]] * num_vars for variables in combine(*ts_and_fs): print variables So, I think a "combine" function (or some other, better name) would be a good thing to have, at least in the itertools, if not as a built-in function. How to implement it? Well, probably it should be done in C for efficiency, but of the versions done http://boredzo.org/blog/archives/2007-10-22/generating-all-combinations the one that I like best is def combine(*sequences): #Test to guard against infinite recursion if sequences: def _inner(*seqs): if len(seqs) == 1: for i in seqs[0]: yield (i, ) else: for rest in combine(*seqs[:-1]): for i in seqs[-1]: yield rest + (i, ) return _inner(*sequences) else: #Empty generator that yields StopIteration def _inner(): return yield return _inner() However, I'm also convinced that there is a more efficient way to do this. So, what do you think? Is this a common enough need that it should be built into itertools? The main namespace? Or should we leave it out, since adding it would encourage people writing O(n^x) algorithms? If nothing else, list members can enjoy rewriting this function for fun. -- Carl Johnson

Carl Johnson wrote:
FWIW, I find myself doing this often with numpy: for index in numpy.ndindex(arr.shape): # do something using arr's indexes, whatever they are (Broadcasting is better, but sometimes this is unavoidable.) The point is there's a general use-case Carl left out: what if you don't know how deeply you'll need to nest? Our Functional friends know the answer to this, but IIRC the Python design philosophy doesn't actively encourage recursive solutions. I don't think the use-cases are common enough to warrant a new builtin (though that's just my $0.000001 opinion - it may be that *everyone* will want to flatten their nested loops), but something like "combine" would fit very well into itertools. Neil

On Thu, 22 May 2008, Carl Johnson wrote:
I think extra indentation helps, because it makes it clearer how the loops nest, but I suppose you could do this if you really wanted;
I've worked this out (I'm sure I did this in intro CS once before...), although there is probably some optimization trick I'm missing, and I'm probably doing some silly double-allocation thing or something.
I really, really don't think this is worth putting in any distributed package, as it's a pretty simple thing to code up if you're careful. That said, it _was_ a lot of fun rewriting it. Thanks. :D -- Cheers, Leif

On Fri, May 23, 2008 at 3:15 AM, Carl Johnson <carl@carlsensei.com> wrote:
<snip>
<snip>
<snip>
I really like this function, and I could have used it about a year ago. I actually like the name "combinations" better than "combine". IMO, "combinations" implies the intended meaning more directly, while "combine" could be combining the lists in some other way. Whatever the name, I think it would fit quite nicely into itertools. I don't think efficiency should be considered as a reason to leave it out, because as you said, there are simply cases (any time you need to enumerate all possible values) where you can't get around it. Besides, it's not too difficult to create quadratic or cubic lists using list comprehensions... the syntax is actually designed to support it. As for a stab at a good implementation: def combinations (*args): def combiner (seqs): try: seq0 = seqs.next() except StopIteration: yield () else: for prefix in combiner(iter(seqs)): for x in seq0: yield prefix + (x,) return combiner(reversed(args)) I tried to avoid unpacking and repacking lists, I returned the combinations as tuples, and I tried to use generators wherever possible. I didn't actually do any performance tests, though. Brandon

On Fri, May 23, 2008 at 4:38 AM, Brandon Mintern <bmintern@gmail.com> wrote:
I was incorrectly thinking that combiner would be called more than once on seqs, and thus I used iter(seqs) to avoid making a call on an exhausted iterator. It turns out that's unnecessary since there is only one recursive call per call to combiner, so the result is the somewhat improved: def combinations (*args): def combiner (seqs): try: seq0 = seqs.next() except StopIteration: yield () else: for prefix in combiner(seqs): for x in seq0: yield prefix + (x,) return combiner(reversed(args)) I think this is a solution I can be proud of, in terms of correctness (iterating in the proper order), readability, and efficiency. Brandon

Carl Johnson schrieb:
So, what do you think? Is this a common enough need that it should be built into itertools?
Presumably, since it has been added to itertools in 2.6 under the name product(). (Maybe Raymond borrowed the time machine?) :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On 23 May 2008, at 08:15, Carl Johnson wrote:
I don't understand why you define those _inner() generator functions. Anyway, here's a version that doesn't use recursion: def product(*sequences): if not sequences: return i, imax = 0, len(sequences)-1 val = [None]*(imax+1) iters = [iter(seq) for seq in sequences] while i >= 0: for val[i] in iters[i]: if i == imax: yield tuple(val) else: i += 1 break else: iters[i] = iter(sequences[i]) i -= 1
It is already in itertools: Python 3.0a4+ (py3k:62388, Apr 19 2008, 15:34:00) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information.
-- Arnaud

Carl Johnson wrote:
FWIW, I find myself doing this often with numpy: for index in numpy.ndindex(arr.shape): # do something using arr's indexes, whatever they are (Broadcasting is better, but sometimes this is unavoidable.) The point is there's a general use-case Carl left out: what if you don't know how deeply you'll need to nest? Our Functional friends know the answer to this, but IIRC the Python design philosophy doesn't actively encourage recursive solutions. I don't think the use-cases are common enough to warrant a new builtin (though that's just my $0.000001 opinion - it may be that *everyone* will want to flatten their nested loops), but something like "combine" would fit very well into itertools. Neil

On Thu, 22 May 2008, Carl Johnson wrote:
I think extra indentation helps, because it makes it clearer how the loops nest, but I suppose you could do this if you really wanted;
I've worked this out (I'm sure I did this in intro CS once before...), although there is probably some optimization trick I'm missing, and I'm probably doing some silly double-allocation thing or something.
I really, really don't think this is worth putting in any distributed package, as it's a pretty simple thing to code up if you're careful. That said, it _was_ a lot of fun rewriting it. Thanks. :D -- Cheers, Leif

On Fri, May 23, 2008 at 3:15 AM, Carl Johnson <carl@carlsensei.com> wrote:
<snip>
<snip>
<snip>
I really like this function, and I could have used it about a year ago. I actually like the name "combinations" better than "combine". IMO, "combinations" implies the intended meaning more directly, while "combine" could be combining the lists in some other way. Whatever the name, I think it would fit quite nicely into itertools. I don't think efficiency should be considered as a reason to leave it out, because as you said, there are simply cases (any time you need to enumerate all possible values) where you can't get around it. Besides, it's not too difficult to create quadratic or cubic lists using list comprehensions... the syntax is actually designed to support it. As for a stab at a good implementation: def combinations (*args): def combiner (seqs): try: seq0 = seqs.next() except StopIteration: yield () else: for prefix in combiner(iter(seqs)): for x in seq0: yield prefix + (x,) return combiner(reversed(args)) I tried to avoid unpacking and repacking lists, I returned the combinations as tuples, and I tried to use generators wherever possible. I didn't actually do any performance tests, though. Brandon

On Fri, May 23, 2008 at 4:38 AM, Brandon Mintern <bmintern@gmail.com> wrote:
I was incorrectly thinking that combiner would be called more than once on seqs, and thus I used iter(seqs) to avoid making a call on an exhausted iterator. It turns out that's unnecessary since there is only one recursive call per call to combiner, so the result is the somewhat improved: def combinations (*args): def combiner (seqs): try: seq0 = seqs.next() except StopIteration: yield () else: for prefix in combiner(seqs): for x in seq0: yield prefix + (x,) return combiner(reversed(args)) I think this is a solution I can be proud of, in terms of correctness (iterating in the proper order), readability, and efficiency. Brandon

Carl Johnson schrieb:
So, what do you think? Is this a common enough need that it should be built into itertools?
Presumably, since it has been added to itertools in 2.6 under the name product(). (Maybe Raymond borrowed the time machine?) :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On 23 May 2008, at 08:15, Carl Johnson wrote:
I don't understand why you define those _inner() generator functions. Anyway, here's a version that doesn't use recursion: def product(*sequences): if not sequences: return i, imax = 0, len(sequences)-1 val = [None]*(imax+1) iters = [iter(seq) for seq in sequences] while i >= 0: for val[i] in iters[i]: if i == imax: yield tuple(val) else: i += 1 break else: iters[i] = iter(sequences[i]) i -= 1
It is already in itertools: Python 3.0a4+ (py3k:62388, Apr 19 2008, 15:34:00) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information.
-- Arnaud
participants (7)
-
Arnaud Delobelle
-
Brandon Mintern
-
Carl Johnson
-
Georg Brandl
-
George Sakkis
-
Leif Walsh
-
Neil Toronto