randint for long type (permutations)
I want to choose a subset of all possible permutations of a sequence of length N, with each element of the subset unique. This is then going to be scattered across multiple machines using mpi. Since there is a onetoone mapping between the integers in the range 0 <= x < N! and the possible permutations, one solution would be to choose M < N! integers randomly, check for uniqueness, and then scatter only the integers so that individual nodes can construct the permutations. However the integers need to be of type long, and randint doesn't work for numbers which cannot be converted to int. Any suggestions?
Call randint until you get enough bits of entropy to for a long with the appropriate number of bits.
def randwords(n): result = 0L for i in range(n): result = (result<<32)  randint(0,2<<321) return result
Kevin
On 6/14/07, Will Woods will.woods@ynic.york.ac.uk wrote:
I want to choose a subset of all possible permutations of a sequence of length N, with each element of the subset unique. This is then going to be scattered across multiple machines using mpi. Since there is a onetoone mapping between the integers in the range 0 <= x < N! and the possible permutations, one solution would be to choose M < N! integers randomly, check for uniqueness, and then scatter only the integers so that individual nodes can construct the permutations. However the integers need to be of type long, and randint doesn't work for numbers which cannot be converted to int. Any suggestions? _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Will Woods wrote:
I want to choose a subset of all possible permutations of a sequence of length N, with each element of the subset unique. This is then going to be scattered across multiple machines using mpi. Since there is a onetoone mapping between the integers in the range 0 <= x < N! and the possible permutations, one solution would be to choose M < N! integers randomly, check for uniqueness, and then scatter only the integers so that individual nodes can construct the permutations. However the integers need to be of type long, and randint doesn't work for numbers which cannot be converted to int. Any suggestions?
Is it too slow or bandwidthconsuming to generate the permutations on the master node and then distribute the permuted arrays to the worker nodes? You can reduce that somewhat by permuting arange(N) as an index array and send that to each of the worker nodes.
On 14/06/07, Will Woods will.woods@ynic.york.ac.uk wrote:
I want to choose a subset of all possible permutations of a sequence of length N, with each element of the subset unique. This is then going to be scattered across multiple machines using mpi. Since there is a onetoone mapping between the integers in the range 0 <= x < N! and the possible permutations, one solution would be to choose M < N! integers randomly, check for uniqueness, and then scatter only the integers so that individual nodes can construct the permutations. However the integers need to be of type long, and randint doesn't work for numbers which cannot be converted to int. Any suggestions?
A single integer might not be the best representation of a permutation (I can't see just now how you encode it, actually). Why not represent it as a tuple of n integers a_i with a_i<=i? (to get a permutation from this, treat a_i as an instruction to swap element i with element a_i). This should be (I haven't proven it but I'm pretty sure) a bijective representation of permutations. Not very compact, though I suppose you could use an array of 8bit integers for n<256. It will serve as a key in dictionaries, though, and converting from a permutation in some other representation (as returned by argsort?) to this shouldn't be too difficult. Converting these to longs is also not too difficult (sum(a_i[1:]*cumprod(i[1:])) nor is the reverse operation.
And of course generating the permutation randomly becomes easy.
Also, it's worth noting that if n is even moderately large, n! is such a staggering number that the probability you will ever generate the same permutation twice is less than the probability that your data will be modified undetectably in memory by cosmic rays.
Anne
The range of N I need is from 5100, which spans the highly likely to highly improbable for M around 100010000. The permutation can be derived from an integer using the algorithm here: http://en.wikipedia.org/wiki/Permutation
I have found the solution to the random number generator in the random module (I had only looked in numpy.random previously!):
getrandbits(k) Returns a python long int with k random bits. This method is supplied with the MersenneTwister generator and some other generators may also provide it as an optional part of the API. When available, getrandbits() enables randrange() to handle arbitrarily large ranges. New in version 2.4.
Thanks for all the replies.
Will
Anne Archibald wrote:
On 14/06/07, Will Woods will.woods@ynic.york.ac.uk wrote:
I want to choose a subset of all possible permutations of a sequence of length N, with each element of the subset unique. This is then going to be scattered across multiple machines using mpi. Since there is a onetoone mapping between the integers in the range 0 <= x < N! and the possible permutations, one solution would be to choose M < N! integers randomly, check for uniqueness, and then scatter only the integers so that individual nodes can construct the permutations. However the integers need to be of type long, and randint doesn't work for numbers which cannot be converted to int. Any suggestions?
A single integer might not be the best representation of a permutation (I can't see just now how you encode it, actually). Why not represent it as a tuple of n integers a_i with a_i<=i? (to get a permutation from this, treat a_i as an instruction to swap element i with element a_i). This should be (I haven't proven it but I'm pretty sure) a bijective representation of permutations. Not very compact, though I suppose you could use an array of 8bit integers for n<256. It will serve as a key in dictionaries, though, and converting from a permutation in some other representation (as returned by argsort?) to this shouldn't be too difficult. Converting these to longs is also not too difficult (sum(a_i[1:]*cumprod(i[1:])) nor is the reverse operation.
And of course generating the permutation randomly becomes easy.
Also, it's worth noting that if n is even moderately large, n! is such a staggering number that the probability you will ever generate the same permutation twice is less than the probability that your data will be modified undetectably in memory by cosmic rays.
Anne _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
The range of N I need is from 5100, which spans the highly likely to highly improbable for M around 100010000. The permutation can be derived from an integer using the algorithm here: http://en.wikipedia.org/wiki/Permutation
I have found the solution to the random number generator in the random module (I had only looked in numpy.random previously!):
getrandbits(k) Returns a python long int with k random bits. This method is supplied with the MersenneTwister generator and some other generators may also provide it as an optional part of the API. When available, getrandbits() enables randrange() to handle arbitrarily large ranges. New in version 2.4.
Thanks for all the replies.
Will
Anne Archibald wrote:
On 14/06/07, Will Woods will.woods@ynic.york.ac.uk wrote:
I want to choose a subset of all possible permutations of a sequence of length N, with each element of the subset unique. This is then going to be scattered across multiple machines using mpi. Since there is a onetoone mapping between the integers in the range 0 <= x < N! and the possible permutations, one solution would be to choose M < N! integers randomly, check for uniqueness, and then scatter only the integers so that individual nodes can construct the permutations. However the integers need to be of type long, and randint doesn't work for numbers which cannot be converted to int. Any suggestions?
A single integer might not be the best representation of a permutation (I can't see just now how you encode it, actually). Why not represent it as a tuple of n integers a_i with a_i<=i? (to get a permutation from this, treat a_i as an instruction to swap element i with element a_i). This should be (I haven't proven it but I'm pretty sure) a bijective representation of permutations. Not very compact, though I suppose you could use an array of 8bit integers for n<256. It will serve as a key in dictionaries, though, and converting from a permutation in some other representation (as returned by argsort?) to this shouldn't be too difficult. Converting these to longs is also not too difficult (sum(a_i[1:]*cumprod(i[1:])) nor is the reverse operation.
And of course generating the permutation randomly becomes easy.
Also, it's worth noting that if n is even moderately large, n! is such a staggering number that the probability you will ever generate the same permutation twice is less than the probability that your data will be modified undetectably in memory by cosmic rays.
Anne _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Will Woods wrote:
The range of N I need is from 5100, which spans the highly likely to highly improbable for M around 100010000. The permutation can be derived from an integer using the algorithm here: http://en.wikipedia.org/wiki/Permutation
You really, really don't want to do it this way. 100! is a huge number that you *cannot* sample effectively. With the permutation algorithm given here, only the first fraction of the sequence will be shuffled at all.
In [29]: def perm(k, s): fact = 1 for j in range(2, len(s)+1): fact *= j1 i = j  ((k//fact) % j)  1 tmp = s[i], s[j1] s[j1], s[i] = tmp return s ....:
In [37]: s0 = range(100)
In [38]: print s0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
In [39]: print perm(10000000000000, s0[:]) [9, 12, 11, 4, 13, 14, 8, 7, 15, 5, 10, 0, 3, 1, 2, 6, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
Please, take my advice and shuffle the sequences on the master node using numpy.random.permutation() and distribute the sequences among the worker nodes.
Will Woods wrote:
The range of N I need is from 5100, which spans the highly likely to highly improbable for M around 100010000. The permutation can be derived from an integer using the algorithm here: http://en.wikipedia.org/wiki/Permutation
Actually, I take it back. It's not as bad as I thought it was. However, instead of cobbling together the bits, just use random.randrange().
On 6/14/07, Will Woods will.woods@ynic.york.ac.uk wrote:
I want to choose a subset of all possible permutations of a sequence of length N, with each element of the subset unique. This is then going to be scattered across multiple machines using mpi. Since there is a onetoone mapping between the integers in the range 0 <= x < N! and the possible permutations, one solution would be to choose M < N! integers randomly, check for uniqueness, and then scatter only the integers so that individual nodes can construct the permutations. However the integers need to be of type long, and randint doesn't work for numbers which cannot be converted to int. Any suggestions? ________
How big is N?
Chuck
participants (6)

Anne Archibald

Charles R Harris

Kevin Jacobs <jacobs＠bioinformed.com>

Robert Kern

Will Woods

Will Woods