random number generation in python compared to gsl
Hi all,
I have a question concerning the Mersenne Twister random number generation in numpy: when I seed it with 0, I get a different sequence of numbers in numpy, compared to GSL.
In numpy:
r = numpy.Random.RandomState(seed=0) r.uniform(size=5) > array([ 0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ])
whereas in GSL the first numbers are 0.99974 0.16291 0.2826 0.94720 0.23166
Matlab gives the same result as numpy...
I have translated some python code to c, and would like to debug it  therefore, I would like to have exactly the same set of random numbers... How can I provoke this ?
Best.
Giovanni
On Wed, Nov 5, 2008 at 11:30 AM, Giovanni Samaey giovanni.samaey@cs.kuleuven.be wrote:
Hi all, I have a question concerning the Mersenne Twister random number generation in numpy: when I seed it with 0, I get a different sequence of numbers in numpy, compared to GSL. In numpy: r = numpy.Random.RandomState(seed=0) r.uniform(size=5) > array([ 0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ]) whereas in GSL the first numbers are 0.99974 0.16291 0.2826 0.94720 0.23166 Matlab gives the same result as numpy... I have translated some python code to c, and would like to debug it  therefore, I would like to have exactly the same set of random numbers... How can I provoke this ? Best. Giovanni
Hi, how about other seed values ? I thought seed=0, is (often) used to mean a "random", i.e. current time or alike, seed value ... !?
Sebastian Haase
Sebastian Haase wrote:
Hi, how about other seed values ? I thought seed=0, is (often) used to mean a "random", i.e. current time or alike, seed value ... !?
Not really. A fixed seed means you will always get the exact same serie of numbers. The seed is the initial condition of your random generator, and a random generator is a totally predictable, deterministic process. The (pseudo) randomness is a consequence of having a random, unknown seed, causing the serie to behave seemingly random to most statistical tests. This can be time, keystrokes, etc...
cheers,
David
Hi, how about other seed values ? I thought seed=0, is (often) used to mean a "random", i.e. current time or alike, seed value ... !?
Not in this case: I always get the same sequence with seed=0 (different for both implementation, but the same each time I run it.) I got around it by installing pygsl and taking random numbers from there instead of from numpy.
But I still find it strange to get two different sequences from two implementation that claim to be the same algorithm...
Giovanni
Sebastian Haase _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Not in this case: I always get the same sequence with seed=0 (different for both implementation, but the same each time I run it.) I got around it by installing pygsl and taking random numbers from there instead of from numpy.
But I still find it strange to get two different sequences from two implementation that claim to be the same algorithm...
Giovanni
Hi,
I didn't check which MT was used, because there are several different. MT is based on one Mersenne prime, but for instance the Boost library wraps two generators with two different values. Perhaps the same occurs here. Other explanations include:  MT must use an array of starting values, perhaps the first is 0, but one is 0, 1, 2, 3, ... and the other uses 0, 0, 0, ...  MT generates integers, perhaps there is a difference between the size of the generated integers (not likely) or a difference in the transformation into a float ?
Matthieu
On Wed, Nov 05, 2008 at 03:19:09PM +0100, Matthieu Brucher wrote:
Not in this case: I always get the same sequence with seed=0 (different for both implementation, but the same each time I run it.) I got around it by installing pygsl and taking random numbers from there instead of from numpy.
But I still find it strange to get two different sequences from two implementation that claim to be the same algorithm...
Giovanni
Hi,
I didn't check which MT was used, because there are several different. MT is based on one Mersenne prime, but for instance the Boost library wraps two generators with two different values. Perhaps the same occurs here. Other explanations include:
 MT must use an array of starting values, perhaps the first is 0, but
one is 0, 1, 2, 3, ... and the other uses 0, 0, 0, ...
This is the key, there is a lot of variation in the seeding algorithm that is used in mersenne twister algorithms, usually they use some kind of linear congruential algorithm to get started. I think that GSL uses the published mersenne twister seed algorithm, what you can do is find out how gsl does this and just set the full seed array in numpy.random.RandomState yourself. I have done this for comparison with R's mersenne twister algorithm using my own MT implementation and it works like a charm.
Gabriel
On Wed, Nov 5, 2008 at 08:05, Giovanni Samaey giovanni.samaey@cs.kuleuven.be wrote:
Hi, how about other seed values ? I thought seed=0, is (often) used to mean a "random", i.e. current time or alike, seed value ... !?
Not in this case: I always get the same sequence with seed=0 (different for both implementation, but the same each time I run it.) I got around it by installing pygsl and taking random numbers from there instead of from numpy.
But I still find it strange to get two different sequences from two implementation that claim to be the same algorithm...
GSL has this bit of code:
if (s == 0) s = 4357; /* the default seed is 4357 */
We don't. Otherwise, I believe the two seeding algorithms are identical.
Dear Robert,
indeed, this is the difference ! Thanks ! Seeding numpy with 4357 gives identical sequences...
Giovanni
On 05 Nov 2008, at 19:01, Robert Kern wrote:
On Wed, Nov 5, 2008 at 08:05, Giovanni Samaey giovanni.samaey@cs.kuleuven.be wrote:
Hi, how about other seed values ? I thought seed=0, is (often) used to mean a "random", i.e. current time or alike, seed value ... !?
Not in this case: I always get the same sequence with seed=0 (different for both implementation, but the same each time I run it.) I got around it by installing pygsl and taking random numbers from there instead of from numpy.
But I still find it strange to get two different sequences from two implementation that claim to be the same algorithm...
GSL has this bit of code:
if (s == 0) s = 4357; /* the default seed is 4357 */
We don't. Otherwise, I believe the two seeding algorithms are identical.
 Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
participants (6)

David Cournapeau

Gabriel Gellner

Giovanni Samaey

Matthieu Brucher

Robert Kern

Sebastian Haase