Extracting all the possible combinations of a grid
Hi all, I want to generate all the possible triplets of integers in [0, n]. I am wondering want the best possible way to do this is. To make things clearer, I could generate i, j, k using indices: i, j, k = indices((n, n, n)) But I will have several times the same triplet with different ordenings. I am looking for a loop-free way of creating three arrays i, j, k with all the triplets present once, and only once. Any hint appreciated. Cheers, Gaël PS: I am having problems with my mail, so excuse me if this is a dup
On 9/21/07, Gael Varoquaux
Hi all,
I want to generate all the possible triplets of integers in [0, n]. I am wondering want the best possible way to do this is.
To make things clearer, I could generate i, j, k using indices:
i, j, k = indices((n, n, n))
But I will have several times the same triplet with different ordenings. I am looking for a loop-free way of creating three arrays i, j, k with all the triplets present once, and only once.
Any hint appreciated.
Go here, http://www.cs.utsa.edu/~wagner/knuth/. I think you want fascicle 4A, http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf. Some of the fascicles from Vol 4 of TAOCP are now in print, http://tinyurl.com/2goxpr. Chuck
On 9/21/07, Charles R Harris
On 9/21/07, Gael Varoquaux
wrote: Hi all,
I want to generate all the possible triplets of integers in [0, n]. I am wondering want the best possible way to do this is.
To make things clearer, I could generate i, j, k using indices:
i, j, k = indices((n, n, n))
But I will have several times the same triplet with different ordenings. I am looking for a loop-free way of creating three arrays i, j, k with all the triplets present once, and only once.
Any hint appreciated.
Go here, http://www.cs.utsa.edu/~wagner/knuth/http://www.cs.utsa.edu/%7Ewagner/knuth/. I think you want fascicle 4A, http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdfhttp://www.cs.utsa.edu/%7Ewagner/knuth/fasc4a.pdf. Some of the fascicles from Vol 4 of TAOCP are now in print, http://tinyurl.com/2goxpr.
Oops, make that fascicle 3A, not 4A. Chuck
On Fri, Sep 21, 2007 at 01:52:31PM -0600, Charles R Harris wrote:
Go here, http://www.cs.utsa.edu/~wagner/knuth/. I think you want fascicle 4A, http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf. Some of the fascicles from Vol 4 of TAOCP are now in print, http://tinyurl.com/2goxpr.
:->. That's the best answer I have ever had so far: RTFAOCP ! OK, I'll have a look, but I'd be surprised he talks about loop free ways. Anyhow, I have kludged a solution that seems to be working. Not the most beautiful ever, but it seems to be working. I will need to time it, but first I need to ask the end user what the actual numbers are. Here is the kludge, feel free to lough: +++++++++++++++++++++++++++++++++++++++++++++++++++++ from numpy import reshape, indices, arange, take, diff def unique_nplets(size, dims): """ Generate all the possible unique combinations of n=dims integers below size. """ # Generate all the possible nplets ##triplets = reshape(mgrid[0:2, 0:2, 0:2], (3, -1)) nplets = reshape(indices(dims*[size,]), (dims, -1)) # Sort them nplets.sort(axis=0) nplets.sort(axis=-1) # Compare succesive columns using the diff, than sum the diff (all # positiv # numbers), is the sum is not zero, the columns are different, make a # mask out of this condition, and apply it to arange to retrieve the # indices of these columns unique_indices = arange(size**dims)[(diff(nplets).sum(axis=0) > 0)] # Retrieve the unique columns unique_nplets = take(nplets, unique_indices, axis=-1) return unique_nplets +++++++++++++++++++++++++++++++++++++++++++++++++++++ I was actually excepting numpy (or scipy) to have functions built-in for these kind of problems. Or to have people on the list having already done this. Thanks for the reply, I will indeed look it up. Gaël
On 9/21/07, Gael Varoquaux
On Fri, Sep 21, 2007 at 01:52:31PM -0600, Charles R Harris wrote:
Go here, http://www.cs.utsa.edu/~wagner/knuth/. I think you want fascicle 4A, http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf. Some of the fascicles from Vol 4 of TAOCP are now in print, http://tinyurl.com/2goxpr.
:->. That's the best answer I have ever had so far: RTFAOCP !
OK, I'll have a look, but I'd be surprised he talks about loop free ways.
Anyhow, I have kludged a solution that seems to be working. Not the most beautiful ever, but it seems to be working. I will need to time it, but first I need to ask the end user what the actual numbers are.
<snip> I was actually excepting numpy (or scipy) to have functions built-in for
these kind of problems. Or to have people on the list having already done this.
I wrote up some of the combinatorial algorithms in python a few years ago for my own use in writing a paper, (*Harris*, C. R. Solution of the aliasing and least squares problems of spaced antenna interferometric measurements using lattice methods, Radio Sci. 38, 2003). I even thought I had found an error and have a letter from Knuth pointing out that I was mistaken ;) Anyway, there are a lot of neat things in volume 4 and it is well worth the read. As to putting these things in scipy, I wouldn't mind at all if there was a cs kit with various trees, union-find (equivalence relation) structures, indexing, combinatorial generation, and graph algorithms, but I am not sure how well they would fit in. Chuck
On Fri, Sep 21, 2007 at 02:33:42PM -0600, Charles R Harris wrote:
I wrote up some of the combinatorial algorithms in python a few years ago for my own use in writing a paper, ( Harris, C. R. Solution of the aliasing and least squares problems of spaced antenna interferometric measurements using lattice methods, Radio Sci. 38, 2003). I even thought I had found an error and have a letter from Knuth pointing out that I was mistaken ;) Anyway, there are a lot of neat things in volume 4 and it is well worth the read.
It is. I did my homework, and now I understand why you point this out. Basically array programming is really not suited for these kind of things. The problem with my solution is that is blows up the memory really quickly. It is actually a pretty poor solution. It is obvious when you think about this a bit that the problem diverges really quickly if you try the brute force approach. It is actually really quick, until it blows the memory. I'll wait to know exactly what the numbers are (I am doing this to help a friend), I see if I keep my kludge, if I use one a Knuth's nice algorithms or if I simply implement a for loop in C + weave inline.
As to putting these things in scipy, I wouldn't mind at all if there was a cs kit with various trees, union-find (equivalence relation) structures, indexing, combinatorial generation, and graph algorithms, but I am not sure how well they would fit in.
That would be great. I have wanted these things a few times. I am not sure either how to fit them in. I don't really know how to fit graphs and trees with arrays. Thanks for your wise words, Gaël
On 9/21/07, Gael Varoquaux
I wrote up some of the combinatorial algorithms in python a few years ago for my own use in writing a paper, ( Harris, C. R. Solution of the aliasing and least squares problems of spaced antenna interferometric measurements using lattice methods, Radio Sci. 38, 2003). I even
On Fri, Sep 21, 2007 at 02:33:42PM -0600, Charles R Harris wrote: thought I
had found an error and have a letter from Knuth pointing out that I was mistaken ;) Anyway, there are a lot of neat things in volume 4 and it is well worth the read.
It is. I did my homework, and now I understand why you point this out. Basically array programming is really not suited for these kind of things. The problem with my solution is that is blows up the memory really quickly. It is actually a pretty poor solution. It is obvious when you think about this a bit that the problem diverges really quickly if you try the brute force approach. It is actually really quick, until it blows the memory.
I found generators a good approach to this sort of thing: for (i,j,k) in triplets(n) : Chuck
On Fri, Sep 21, 2007 at 02:58:43PM -0600, Charles R Harris wrote:
I found generators a good approach to this sort of thing:
for (i,j,k) in triplets(n) :
That's what we where doing so far, but in the code itself. The result was unbearably slow. I think for our purposes we should build a precalculated table of these nuplets, and then do array calculations on them. I was just wondering what the good way to build this table was. And I immediately thought of using tricks on arrays without realizing the speed at which the combination diverged. In the worst case, a set of nested for loops in a weave.inline wrapped C code populating an array of (n**d/n!, d) size would do the trick and be real fast. Something like (implemented in C, rather than Python): index = 0 for i in xrange(n): for j in xrange(i): for k in xrange(j): index += 1 out_array(index, 0) = i out_array(index, 1) = j out_array(index, 2) = k This would scale pretty well IMHO. Now if this array does not fit in memory, then we have a problem. It means we need to generate a set of smaller arrays. Which should not be a tremendous problem: just extract the outer loop from the code in C, and produce one by one the arrays (this can be implement with a generator, I agree). I have already done this trick of grouping operations on arrays slightly smaller than the max memory on other code and it worked very well, much better than not using arrays. Now we need some numbers on the size of the problem to proceed. But it's Friday night in France, and the kid must be out having fun :->. Thanks for you advice, Gaël
On 9/21/07, Gael Varoquaux
On Fri, Sep 21, 2007 at 02:58:43PM -0600, Charles R Harris wrote:
I found generators a good approach to this sort of thing:
for (i,j,k) in triplets(n) :
That's what we where doing so far, but in the code itself. The result was unbearably slow. I think for our purposes we should build a precalculated table of these nuplets, and then do array calculations on them.
I was just wondering what the good way to build this table was. And I immediately thought of using tricks on arrays without realizing the speed at which the combination diverged.
In the worst case, a set of nested for loops in a weave.inline wrapped C code populating an array of (n**d/n!, d) size would do the trick and be real fast. Something like (implemented in C, rather than Python):
index = 0 for i in xrange(n): for j in xrange(i): for k in xrange(j): index += 1 out_array(index, 0) = i out_array(index, 1) = j out_array(index, 2) = k
This would scale pretty well IMHO.
Try def triplet(n) : out = [] for i in xrange(2,n) : for j in xrange(1,i) : for k in xrange(0,j) : out.append((i,j,k)) return out It's the first algorithm in Knuth, mentioned in passing. In [7]: time a = triplet(100) CPU times: user 0.14 s, sys: 0.01 s, total: 0.14 s Wall time: 0.15 Which isn't too bad for 161700 combinations. However, I still prefer the generator form if you want to save memory for large n. In [2]: def triplet(n) : ...: for i in xrange(2,n) : ...: for j in xrange(1,i) : ...: for k in xrange(1,j) : ...: yield i,j,k ...: It's a bit slower for making lists, however. In [24]: time for i,j,k in triplet(100) : a.append((i,j,k)) CPU times: user 0.24 s, sys: 0.01 s, total: 0.25 s Wall time: 0.25 The more general algorithm L isn't too bad either In [29]: def combination(n,t) : ....: c = arange(t + 2) ....: c[-1] = 0 ....: c[-2] = n ....: while 1 : ....: yield c[:t] ....: j = 0 ....: while c[j] + 1 == c[j+1] : ....: c[j] = j ....: j += 1 ....: if j >= t : ....: return ....: c[j] += 1 In [30]: for i in combination(5,3) : print i ....: [0 1 2] [0 1 3] [0 2 3] [1 2 3] [0 1 4] [0 2 4] [1 2 4] [0 3 4] [1 3 4] [2 3 4] In [31]: time for i in combination(100,3) : pass CPU times: user 0.58 s, sys: 0.02 s, total: 0.60 s Wall time: 0.60 Chuck
On Fri, Sep 21, 2007 at 06:40:52PM -0600, Charles R Harris wrote:
def triplet(n) : out = [] for i in xrange(2,n) : for j in xrange(1,i) : for k in xrange(0,j) : out.append((i,j,k)) return out
I need quadruplets, numbers going up first to 120 then to 200. I tried this, but I can't even go to 120, I simply flood my system (1GB of RAM, AMD64) by using all my RAM.
Which isn't too bad for 161700 combinations. However, I still prefer the generator form if you want to save memory for large n.
In [2]: def triplet(n) : ...: for i in xrange(2,n) : ...: for j in xrange(1,i) : ...: for k in xrange(1,j) : ...: yield i,j,k
That's the way we where doing it, but you really want to build arrays when doing this, because the operations afterwards take ages. Maybe I could build arrays by chunk of say 10^6. And keep itering until I run out.
In [29]: def combination(n,t) : ....: c = arange(t + 2) ....: c[-1] = 0 ....: c[-2] = n ....: while 1 : ....: yield c[:t] ....: j = 0 ....: while c[j] + 1 == c[j+1] : ....: c[j] = j ....: j += 1 ....: if j >= t : ....: return ....: c[j] += 1
I didn't try that one... Wow, that one is pretty good ! 35s for
quadruplets going up to 120, 270s going up to 200s. I can use something
like itertools.groupby to build arrays by grouping the results.
I have implementations in C with weave inline that work pretty well:
* For numbers up to 120, this brute force approach is just fine:
##############################################################################
def unique_triplets(size):
""" Returns the arrays all the possible unique combinations of four
integers below size."""
C_code = """
int index = 0;
for (int j=0; j
On Sat, Sep 22, 2007 at 10:35:16AM +0200, Gael Varoquaux wrote:
I would go for the "generate_fourplets" solution if I had a way to calculate the binomial coefficient without overflows.
Sorry, premature optimisation is the root of all evil, but turning ones brain on early is good. """ ############################################################################## # Some routines for calculation of binomial coefficients def gcd(m,n): while n: m,n=n,m%n return m def binom_(n,k): if k==0: return 1 else: g = gcd(n,k) return binomial(n-1, k-1)/(k/g)*(n/g) def binomial(n,k): if k > n/2: # Limit recursion depth k=n-k return binom_(n,k) """ This is surprisingly fast (surprising for me, at least). Using this and the C code I have, I can generate the quadruplets of 200 integers quite quickly: In [5]: %timeit b = [1 for i in generate_quadruplets(200)] 10 loops, best of 3: 1.61 s per loop With generate_quadruplets given by: """ ############################################################################## def generate_quadruplets(size): """ Returns an iterator on tables listing all the possible unique combinations of four integers below size. """ C_code = """ int index = 0; for (int j=0; j
On 9/22/07, Gael Varoquaux
On Sat, Sep 22, 2007 at 10:35:16AM +0200, Gael Varoquaux wrote:
I would go for the "generate_fourplets" solution if I had a way to calculate the binomial coefficient without overflows.
Sorry, premature optimisation is the root of all evil, but turning ones brain on early is good.
"""
############################################################################## # Some routines for calculation of binomial coefficients def gcd(m,n): while n: m,n=n,m%n return m
def binom_(n,k): if k==0: return 1 else: g = gcd(n,k) return binomial(n-1, k-1)/(k/g)*(n/g)
def binomial(n,k): if k > n/2: # Limit recursion depth k=n-k return binom_(n,k) """
This is surprisingly fast (surprising for me, at least).
Using this and the C code I have, I can generate the quadruplets of 200 integers quite quickly:
In [5]: %timeit b = [1 for i in generate_quadruplets(200)] 10 loops, best of 3: 1.61 s per loop
With generate_quadruplets given by:
"""
############################################################################## def generate_quadruplets(size): """ Returns an iterator on tables listing all the possible unique combinations of four integers below size. """
C_code = """ int index = 0; for (int j=0; j
for i in xrange(size): multiset_coef = binomial(i+3, 3) quadruplets = empty((multiset_coef, 4), dtype=uint32) inline(C_code, ['quadruplets', 'i'], type_converters=converters.blitz)
yield quadruplets """
Umm... that doesn't look quite right. Shouldn't it be something like def generate_quadruplets(size): """ Returns an iterator on tables listing all the possible unique combinations of four integers below size. """ C_code = """ int index = 0; for (int j=2; j= t : break c[j] += 1 yield out[:i+1] if j >= t : return I think this would go well as a C++ function object. The python wrapper would look something like def combination(n,t,chunk) : next = cpp_combination(n,t,chunk) while 1 : out = next() if len(out) > 0 : yield out else : return Chuck
On Sat, Sep 22, 2007 at 09:58:33AM -0600, Charles R Harris wrote:
Umm... that doesn't look quite right. Shouldn't it be something like
Puzzling. My implementation works, as far as I could test it, which I did as much as I could. Maybe the two are equivalent.
Algorithm L can be chunked pretty easily also:
Yes, I think this would scale even better for really big sets, as the size of the chunk can be more easily controlled. The good think is that with chunks like this the calculation is fully "local", so if the result of the calculation does not depend on the number of points (eg its a sum of the calculation for eache point) then the calculation can go on for ever, there is no limit to the number of points other than the patience of the operator. So if the algorythm is right, they can leave it running for two days, and get great results for a paper. Thanks for your input, Gaël
Charles harris posted a generator function for generating combinations (based on Knuth's fascicle). I get the expected results by iterating over the resulting generator, but not if I let ``list`` do that for me. What is more, changing ``arange`` to ``range`` in the code eliminates this anomaly. What am I failing to understand here? Thank you, Alan Isaac ################ example ############################### def combinations(n,t): c = arange(t + 2) c[-2] = n c[-1] = 0 while 1: yield c[:t] j = 0 while c[j] + 1 == c[j+1] : c[j] = j j += 1 if j >= t : return c[j] += 1 print "I get the expected results by iterating:" for tpl in combinations(5,3): print tpl print "However list conversion fails:" print list(combinations(5,3)) ############# end example ###############################
On 9/22/07, Alan G Isaac
Charles harris posted a generator function for generating combinations (based on Knuth's fascicle). I get the expected results by iterating over the resulting generator, but not if I let ``list`` do that for me. What is more, changing ``arange`` to ``range`` in the code eliminates this anomaly.
What am I failing to understand here?
Thank you, Alan Isaac
There are a couple of potential problems if you want to make a list. Because an array view is returned, and the data is updated in the loop, all the views will end up with the same content. I used arrays and views for speed. To fix that, you need to return a copy, i.e., yield c[:t].copy(). That way you will end up with a list of arrays. If you do yield list(c[:t]), you will get a list of lists. Or, you can do as you did and just use range instead of arange because a slice of a list returns a copy. I admit I'm curious about the speed of the two approaches, lists may be faster than arrays. Lets see.... combination returns array copies, combinaion1 uses range. In [7]: time for i in combination(100,3) : pass CPU times: user 0.89 s, sys: 0.07 s, total: 0.96 s Wall time: 0.96 In [8]: time for i in combination1(100,3) : pass CPU times: user 0.17 s, sys: 0.01 s, total: 0.18 s Wall time: 0.18 Wow, massive speed up using lists, almost as fast as nested loops. I expect lists benefit from faster indexing and faster creation. I think your range fix is the way to go. Things slow down a bit for the full list treatment, but not too much: In [13]: time a = list(combination1(100,3)) CPU times: user 0.26 s, sys: 0.00 s, total: 0.27 s Wall time: 0.27 In [14]: time a = [i for i in combination1(100,3)] CPU times: user 0.35 s, sys: 0.01 s, total: 0.36 s Wall time: 0.36 Chuck
On 9/22/07, Charles R Harris
On 9/22/07, Alan G Isaac
wrote: Charles harris posted a generator function for generating combinations (based on Knuth's fascicle). I get the expected results by iterating over the resulting generator, but not if I let ``list`` do that for me. What is more, changing ``arange`` to ``range`` in the code eliminates this anomaly.
What am I failing to understand here?
Thank you, Alan Isaac
There are a couple of potential problems if you want to make a list. Because an array view is returned, and the data is updated in the loop, all the views will end up with the same content. I used arrays and views for speed. To fix that, you need to return a copy, i.e., yield c[:t].copy(). That way you will end up with a list of arrays. If you do yield list(c[:t]), you will get a list of lists. Or, you can do as you did and just use range instead of arange because a slice of a list returns a copy. I admit I'm curious about the speed of the two approaches, lists may be faster than arrays. Lets see.... combination returns array copies, combinaion1 uses range.
In [7]: time for i in combination(100,3) : pass CPU times: user 0.89 s, sys: 0.07 s, total: 0.96 s Wall time: 0.96
In [8]: time for i in combination1(100,3) : pass CPU times: user 0.17 s, sys: 0.01 s, total: 0.18 s Wall time: 0.18
Wow, massive speed up using lists, almost as fast as nested loops. I expect lists benefit from faster indexing and faster creation. I think your range fix is the way to go. Things slow down a bit for the full list treatment, but not too much:
In [13]: time a = list(combination1(100,3)) CPU times: user 0.26 s, sys: 0.00 s, total: 0.27 s Wall time: 0.27
In [14]: time a = [i for i in combination1(100,3)] CPU times: user 0.35 s, sys: 0.01 s, total: 0.36 s Wall time: 0.36
It's even faster in C++ In [1]: from _combination import combination In [2]: time a = combination(100,3) CPU times: user 0.09 s, sys: 0.01 s, total: 0.09 s Wall time: 0.09 That's for the whole nested list. Arrays would probably be faster in this case because I am calling the python functions to add objects to the lists. Chuck
Gael Varoquaux wrote:
On Fri, Sep 21, 2007 at 02:58:43PM -0600, Charles R Harris wrote:
I found generators a good approach to this sort of thing:
for (i,j,k) in triplets(n) :
That's what we where doing so far, but in the code itself. The result was unbearably slow. I think for our purposes we should build a precalculated table of these nuplets, and then do array calculations on them.
I'm not sure whether this helps, but I have found this generator-based recipe http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/190465 very useful in the past. The comment "If you require the complete list of permutations, just use the built-in list() operator" may apply to your situation, Gary R.
On 9/21/07, Gary Ruben
Gael Varoquaux wrote:
On Fri, Sep 21, 2007 at 02:58:43PM -0600, Charles R Harris wrote:
I found generators a good approach to this sort of thing:
for (i,j,k) in triplets(n) :
That's what we where doing so far, but in the code itself. The result was unbearably slow. I think for our purposes we should build a precalculated table of these nuplets, and then do array calculations on them.
I'm not sure whether this helps, but I have found this generator-based recipe http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/190465 very useful in the past. The comment "If you require the complete list of permutations, just use the built-in list() operator" may apply to your situation,
Not bad for the triplets, I would have thought the cost of recursion would be greater. The use of lists slows it down a bit for bigger combinations. In [50]: time for i in xuniqueCombinations(range(100),3) : pass CPU times: user 0.56 s, sys: 0.02 s, total: 0.58 s Wall time: 0.57 In [51]: time for i in xuniqueCombinations(range(20),10) : pass CPU times: user 2.17 s, sys: 0.07 s, total: 2.23 s Wall time: 2.23 In [52]: time for i in combination(20,10) : pass CPU times: user 0.96 s, sys: 0.01 s, total: 0.97 s Wall time: 0.97 Chuck
On 9/21/07, Gael Varoquaux
On Fri, Sep 21, 2007 at 01:52:31PM -0600, Charles R Harris wrote:
Go here, http://www.cs.utsa.edu/~wagner/knuth/. I think you want fascicle 4A, http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf. Some of the fascicles from Vol 4 of TAOCP are now in print, http://tinyurl.com/2goxpr.
:->. That's the best answer I have ever had so far: RTFAOCP !
:o) OK, I'll have a look, but I'd be surprised he talks about loop free ways. It's Knuth, he will talk about all ways known to man. Chuck
participants (4)
-
Alan G Isaac
-
Charles R Harris
-
Gael Varoquaux
-
Gary Ruben