Hello,
I have been reading that there may be potential issues with the Box-Muller transform, which is used by the numpy.random.normal() function. Supposedly, since f*x1 and f*x2 are not independent variables, then the individual elements (corresponding to f*x1 and f*x2 ) of the distribution also won't be independent. For example, see "Stochastic Simulation" by Ripley, pages 54-59, where the random values end up distributed on a spiral. Note that they mention that they only looked at "congruential generators." Is the random number generator used by numpy congruential?
I have tried to generate plots that demonstrate this problem, but have come up short. For example:
import numpy , pylab nsamples = 10**6 n = numpy.random.normal( 0.0 , 1.0 , nsamples ) pylab.scatter( n[0:-1:2] , n[1:-1:2] , 0.1 ) pylab.show()
I can zoom in and out, and the scatter still looks random (white noise -- almost like tv static). Does this prove that there is no problem? And if so, why does numpy do a better job than as demonstrated by Ripley?
Regards, Mike Gilbert
I think the use of a correct uniform generator will allow a good normal distribution. Congruental generators are very basic generators, everyone knows they should not be used. I think Numpy uses a Mersenne Twisted generator, for which you can generate "independant" vectors with several hundred values.
Matthieu
2008/12/10 Michael Gilbert michael.s.gilbert@gmail.com:
Hello,
I have been reading that there may be potential issues with the Box-Muller transform, which is used by the numpy.random.normal() function. Supposedly, since f*x1 and f*x2 are not independent variables, then the individual elements (corresponding to f*x1 and f*x2 ) of the distribution also won't be independent. For example, see "Stochastic Simulation" by Ripley, pages 54-59, where the random values end up distributed on a spiral. Note that they mention that they only looked at "congruential generators." Is the random number generator used by numpy congruential?
I have tried to generate plots that demonstrate this problem, but have come up short. For example:
import numpy , pylab nsamples = 10**6 n = numpy.random.normal( 0.0 , 1.0 , nsamples ) pylab.scatter( n[0:-1:2] , n[1:-1:2] , 0.1 ) pylab.show()
I can zoom in and out, and the scatter still looks random (white noise -- almost like tv static). Does this prove that there is no problem? And if so, why does numpy do a better job than as demonstrated by Ripley?
Regards, Mike Gilbert _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Wed, 10 Dec 2008 14:03:39 -0500, Michael Gilbert wrote:
I have been reading that there may be potential issues with the Box-Muller transform, which is used by the numpy.random.normal() function. Supposedly, since f*x1 and f*x2 are not independent variables, then the individual elements (corresponding to f*x1 and f*x2 ) of the distribution also won't be independent. For example, see "Stochastic Simulation" by Ripley, pages 54-59, where the random values end up distributed on a spiral. Note that they mention that they only looked at "congruential generators." Is the random number generator used by numpy congruential?
I'm not an expert, but the generator used by Numpy is the Mersenne twister, which should be quite good for many uses. I'd guess what you mention is a way to illustrate that the output of linear congruental generators has serial correlations. At least according to wikipedia, these are negligible in Mersenne twister's output.
On Wed, Dec 10, 2008 at 12:03 PM, Michael Gilbert < michael.s.gilbert@gmail.com> wrote:
Hello,
I have been reading that there may be potential issues with the Box-Muller transform, which is used by the numpy.random.normal() function. Supposedly, since f*x1 and f*x2 are not independent variables, then the individual elements (corresponding to f*x1 and f*x2 ) of the distribution also won't be independent. For example, see "Stochastic Simulation" by Ripley, pages 54-59, where the random values end up distributed on a spiral. Note that they mention that they only looked at "congruential generators." Is the random number generator used by numpy congruential?
I have tried to generate plots that demonstrate this problem, but have come up short. For example:
import numpy , pylab nsamples = 10**6 n = numpy.random.normal( 0.0 , 1.0 , nsamples ) pylab.scatter( n[0:-1:2] , n[1:-1:2] , 0.1 ) pylab.show()
I can zoom in and out, and the scatter still looks random (white noise -- almost like tv static). Does this prove that there is no problem? And if so, why does numpy do a better job than as demonstrated by Ripley?
Bruce Carneal did some tests of robustness and speed for various normal generators. I don't know what his final tests showed for Box-Muller. IIRC, it had some failures but nothing spectacular. The tests were pretty stringent and based on using the erf to turn the normal distribution into a uniform distribution and using the crush tests on the latter.. You could send him a note and ask: bcarneal@gmail.com. Here are the timings he got:
In what follows the uniform variate generators are: lcg64 mwc8222 mt19937 mt19937_64 yarn5
And the normal distribution codes are: trng - default normal distribution code in TRNG boxm - Box-Muller, mtrand lookalike, remembers/uses 2nd value zig7 - a 'Harris' ziggurat indexed by 7 bits zig8 - a 'Harris' ziggurat indexed by 8 bits zig9 - a 'Harris' ziggurat indexed by 9 bits
Here are the numbers in more detail:
# Timings from icc -O2 running on 2.4GhZ Core-2
lcg64 trng: 6.52459e+06 ops per second lcg64 boxm: 2.18453e+07 ops per second lcg64 zig7: 1.80616e+08 ops per second lcg64 zig8: 2.01865e+08 ops per second lcg64 zig9: 2.05156e+08 ops per second
mwc8222 trng: 6.52459e+06 ops per second mwc8222 boxm: 2.08787e+07 ops per second mwc8222 zig7: 9.44663e+07 ops per second mwc8222 zig8: 1.05326e+08 ops per second mwc8222 zig9: 1.03478e+08 ops per second
mt19937 trng: 6.41112e+06 ops per second mt19937 boxm: 1.64986e+07 ops per second mt19937 zig7: 4.23762e+07 ops per second mt19937 zig8: 4.52623e+07 ops per second mt19937 zig9: 4.52623e+07 ops per second
mt19937_64 trng: 6.42509e+06 ops per second mt19937_64 boxm: 1.93226e+07 ops per second mt19937_64 zig7: 5.8762e+07 ops per second mt19937_64 zig8: 6.17213e+07 ops per second mt19937_64 zig9: 6.29146e+07 ops per second
yarn5 trng: 5.95781e+06 ops per second yarn5 boxm: 1.19156e+07 ops per second yarn5 zig7: 1.48945e+07 ops per second yarn5 zig8: 1.54809e+07 ops per second yarn5 zig9: 1.53201e+07 ops per second
# Timings from g++ -O2 running on a 2.4GhZ Core-2
lcg64 trng: 6.72163e+06 ops per second lcg64 boxm: 1.50465e+07 ops per second lcg64 zig7: 1.31072e+08 ops per second lcg64 zig8: 1.48383e+08 ops per second lcg64 zig9: 1.6036e+08 ops per second
mwc8222 trng: 6.64215e+06 ops per second mwc8222 boxm: 1.44299e+07 ops per second mwc8222 zig7: 8.903e+07 ops per second mwc8222 zig8: 1.00825e+08 ops per second mwc8222 zig9: 1.03478e+08 ops per second
mt19937 trng: 6.52459e+06 ops per second mt19937 boxm: 1.28223e+07 ops per second mt19937 zig7: 5.00116e+07 ops per second mt19937 zig8: 5.41123e+07 ops per second mt19937 zig9: 5.47083e+07 ops per second
mt19937_64 trng: 6.58285e+06 ops per second mt19937_64 boxm: 1.42988e+07 ops per second mt19937_64 zig7: 6.72164e+07 ops per second mt19937_64 zig8: 7.39591e+07 ops per second mt19937_64 zig9: 7.46022e+07 ops per second
yarn5 trng: 6.25144e+06 ops per second yarn5 boxm: 8.93672e+06 ops per second yarn5 zig7: 1.50465e+07 ops per second yarn5 zig8: 1.57496e+07 ops per second yarn5 zig9: 1.56038e+07 ops per second
Chuck
Bruce Carneal did some tests of robustness and speed for various normal generators. I don't know what his final tests showed for Box-Muller. IIRC, it had some failures but nothing spectacular. The tests were pretty stringent and based on using the erf to turn the normal distribution into a uniform distribution and using the crush tests on the latter.. You could send him a note and ask: bcarneal@gmail.com. Here are the timings he got:
Thanks for all the insightful replies. This gives me some better confidence in numpy's normal distribution. I will contact Bruce Carneal to get more details.
Thanks again, Mike Gilbert