[Numpy-discussion] Adopt Mersenne Twister 64bit?

Robert Kern robert.kern at gmail.com
Mon Mar 11 05:46:54 EDT 2013


On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
> Hi all,
>
> I am redirecting a discussion on github issue tracker here.  My original
> post (https://github.com/numpy/numpy/issues/3137):
>
> "The current implementation of the RNG seems to be MT19937-32. Since 64-bit
> machines are common nowadays, I am suggesting adding or upgrading to
> MT19937-64.  Thoughts?"
>
> Let me start by answering to njsmith's comments on the issue tracker:
>
> Would it be faster?
>
>
> Although I have not benchmarked the 64-bit implementation, it is likely that
> it will be faster on a 64-bit machine since the number of iteration
> (controlled by NN and MM in the reference implementation
> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c)
> is reduced by half.  In addition, each generation in the 64-bit
> implementation produces a 64-bit random int which can be used to generate
> double precision random number.  Unlike the 32-bit implementation which
> requires generating a pair of 32-bit random int.

>From the last time this was brought up, it looks like getting a single
64-bit integer out from MT19937-64 takes about the same amount of time
as getting a single 32-bit integer from MT19937-32, perhaps a little
slower, even on a 64-bit machine.

http://comments.gmane.org/gmane.comp.python.numeric.general/27773

So getting a single double would be not quite twice as fast.

> But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit
> instructions; thus, it is likely to be slower.  (1)
>
> Use less memory?
>
>
> The amount of memory use will remain the same.  The size of the RNG state is
> the same.
>
> Provide higher quality randomness?
>
>
> My naive answer is that 32-bit and 64-bit implementation have the same
> 2^19937-1 period. Need to do some research and experiments.
>
> Would it change the output of this program: import numpy
> numpy.random.seed(0) print numpy.random.random() ?
>
>
> Unfortunately, yes.  The 64-bit implementation generates a different random
> number sequence with the same seed. (2)
>
>
> My suggestion to overcome (1) and (2) is to allow the user to select between
> the two implementations (and possibly different algorithms in the future).
> If user does not provide a choice, we use the MT19937-32 by default.
>
>         numpy.random.set_state("MT19937_64", …)   # choose the 64-bit
> implementation

Most likely, the different PRNGs should be different subclasses of
RandomState. The module-level convenience API should probably be left
alone. If you need to control the PRNG that you are using, you really
need to be passing around a RandomState instance and not relying on
reseeding the shared global instance. Aside: I really wish we hadn't
exposed `set_state()` in the module API. It's an attractive nuisance.

There is some low-level C work that needs to be done to allow the
non-uniform distributions to be shared between implementations of the
core uniform PRNG, but that's the same no matter how you organize the
upper layer.

--
Robert Kern



More information about the NumPy-Discussion mailing list