Adopt Mersenne Twister 64bit?

Hi all,
I am redirecting a discussion on github issue tracker here. My original post (https://github.com/numpy/numpy/issues/3137):
"The current implementation of the RNG seems to be MT19937-32. Since 64-bit machines are common nowadays, I am suggesting adding or upgrading to MT19937-64. Thoughts?"
Let me start by answering to njsmith's comments on the issue tracker:
Would it be faster?
Although I have not benchmarked the 64-bit implementation, it is likely that it will be faster on a 64-bit machine since the number of iteration (controlled by NN and MM in the reference implementation http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) is reduced by half. In addition, each generation in the 64-bit implementation produces a 64-bit random int which can be used to generate double precision random number. Unlike the 32-bit implementation which requires generating a pair of 32-bit random int.
But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit instructions; thus, it is likely to be slower. (1)
Use less memory?
The amount of memory use will remain the same. The size of the RNG state is the same.
Provide higher quality randomness?
My naive answer is that 32-bit and 64-bit implementation have the same 2^19937-1 period. Need to do some research and experiments.
Would it change the output of this program: import numpy numpy.random.seed(0) print numpy.random.random() ?
Unfortunately, yes. The 64-bit implementation generates a different random number sequence with the same seed. (2)
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit implementation Thoughts?
Best, Siu

On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam siu@continuum.io wrote:
Hi all,
I am redirecting a discussion on github issue tracker here. My original post (https://github.com/numpy/numpy/issues/3137):
"The current implementation of the RNG seems to be MT19937-32. Since 64-bit machines are common nowadays, I am suggesting adding or upgrading to MT19937-64. Thoughts?"
Let me start by answering to njsmith's comments on the issue tracker:
Would it be faster?
Although I have not benchmarked the 64-bit implementation, it is likely that it will be faster on a 64-bit machine since the number of iteration (controlled by NN and MM in the reference implementation http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) is reduced by half. In addition, each generation in the 64-bit implementation produces a 64-bit random int which can be used to generate double precision random number. Unlike the 32-bit implementation which requires generating a pair of 32-bit random int.
From the last time this was brought up, it looks like getting a single
64-bit integer out from MT19937-64 takes about the same amount of time as getting a single 32-bit integer from MT19937-32, perhaps a little slower, even on a 64-bit machine.
http://comments.gmane.org/gmane.comp.python.numeric.general/27773
So getting a single double would be not quite twice as fast.
But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit instructions; thus, it is likely to be slower. (1)
Use less memory?
The amount of memory use will remain the same. The size of the RNG state is the same.
Provide higher quality randomness?
My naive answer is that 32-bit and 64-bit implementation have the same 2^19937-1 period. Need to do some research and experiments.
Would it change the output of this program: import numpy numpy.random.seed(0) print numpy.random.random() ?
Unfortunately, yes. The 64-bit implementation generates a different random number sequence with the same seed. (2)
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit
implementation
Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance.
There is some low-level C work that needs to be done to allow the non-uniform distributions to be shared between implementations of the core uniform PRNG, but that's the same no matter how you organize the upper layer.
-- Robert Kern

On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.kern@gmail.com wrote:
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam siu@continuum.io wrote:
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit
implementation
Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance.
+1
Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance.
And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347
I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work...
-n

On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.kern@gmail.com wrote:
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam siu@continuum.io wrote:
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit
implementation
Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance.
+1
Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance.
And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347
I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work...
(To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.)
-n

On Tue, Mar 12, 2013 at 5:27 PM, Nathaniel Smith njs@pobox.com wrote:
On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.kern@gmail.com wrote:
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam siu@continuum.io wrote:
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit
implementation
Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance.
+1
Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance.
Here is a recipe how to use it http://mail.scipy.org/pipermail/numpy-discussion/2010-September/052911.html
(I'm just drawing a random number as seed that I can save, instead of the entire state.)
Josef
And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347
I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work...
(To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.)
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathaniel Smith wrote:
On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.kern@gmail.com wrote:
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam siu@continuum.io wrote:
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit
implementation
Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance.
+1
Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance.
And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347
I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work...
(To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.)
-n
What do you mean that the idea of global shared state is a bad one? How would you prefer the API to look? An alternative is a stateless rng, where you have to pass it it's state on each invocation, which it would update and return. I hope you're not advocating that.

On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker ndbecker2@gmail.com wrote:
Nathaniel Smith wrote:
On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.kern@gmail.com wrote:
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam siu@continuum.io wrote:
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit
implementation
Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance.
+1
Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance.
And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347
I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work...
(To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.)
-n
What do you mean that the idea of global shared state is a bad one?
The words "global shared state" drives fear into the hearts of experienced programmers everywhere, whatever the context. :-) It's rarely a *good* idea.
How would you prefer the API to look?
There are two current APIs:
1. Instantiate RandomState and call it's methods 2. Just call the functions in numpy.random
The latter has a shared global state. In fact, all of those "functions" are just references to the methods on a shared global RandomState instance.
We advocate using the former API. Note that it already exists. It was the recommended API from day one. No one is recommending adding a new API.
An alternative is a stateless rng, where you have to pass it it's state on each invocation, which it would update and return. I hope you're not advocating that.
No. This is a place where OOP solved the problem neatly.

I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas.

Neal Becker wrote:
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas.
I think if I could just grab a long int from the underlying mersenne twister, through some c api?

Neal Becker wrote:
Neal Becker wrote:
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas.
I think if I could just grab a long int from the underlying mersenne twister, through some c api?
Well, this at least appears to work - probably not the most efficient approach - calls the RandomState object via the python interface to get 4 bytes at a time:
int test1 (bp::object & rs) { bp::str bytes = call_methodbp::str (rs.ptr(), "bytes", 4); // get 4 bytes return *reinterpret_cast<int*> (PyString_AS_STRING (bytes.ptr())); }
BOOST_PYTHON_MODULE (numpy_rand) { boost::numpy::initialize();
def ("test1", &test1); }

On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbecker2@gmail.com wrote:
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule.

Robert Kern wrote:
On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbecker2@gmail.com wrote:
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule.
I did find a way to do this, and the results are good enough. Timing is quite comparable to my pure c++ implementation.
I used rk_ulong from mtrand.so. I also tried using rk_fill, but it was a bit slower.
The boost::python c++ code is attached, for posterity.

Robert Kern wrote:
On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbecker2@gmail.com wrote:
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule.
One thing this code doesn't do: it requires construction of the wrapper class passing in a RandomState object. It doesn't verify you actually gave it a RandomState object. It's hard to do that. The problem as I see it is to perform this check, I need the RandomStateType object, which unfortunately mtrand.so does not export.
The only way to do it is in c++ code:
1. import numpy.random 2. get RandomState class 3. call it to create RandomState instance 4. get the ob_type pointer.
Pretty ugly:
object mod = object (handle<> (borrowed((PyImport_ImportModule("numpy.random"))))); object rs_obj = mod.attr("RandomState"); object rs_inst = call<object> (rs_obj.ptr(), 0); RandomStateTypeObj = rs_inst.ptr()->ob_type;

On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker ndbecker2@gmail.com wrote:
Robert Kern wrote:
On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbecker2@gmail.com wrote:
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule.
One thing this code doesn't do: it requires construction of the wrapper class passing in a RandomState object. It doesn't verify you actually gave it a RandomState object. It's hard to do that. The problem as I see it is to perform this check, I need the RandomStateType object, which unfortunately mtrand.so does not export.
The only way to do it is in c++ code:
- import numpy.random
- get RandomState class
- call it to create RandomState instance
- get the ob_type pointer.
Pretty ugly:
object mod = object (handle<> (borrowed((PyImport_ImportModule("numpy.random"))))); object rs_obj = mod.attr("RandomState"); object rs_inst = call<object> (rs_obj.ptr(), 0); RandomStateTypeObj = rs_inst.ptr()->ob_type;
PyObject_IsInstance() should be sufficient.
http://docs.python.org/2/c-api/object.html#PyObject_IsInstance
-- Robert Kern

Robert Kern wrote:
On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker ndbecker2@gmail.com wrote:
Robert Kern wrote:
On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker ndbecker2@gmail.com wrote:
I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that.
There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule.
One thing this code doesn't do: it requires construction of the wrapper class passing in a RandomState object. It doesn't verify you actually gave it a RandomState object. It's hard to do that. The problem as I see it is to perform this check, I need the RandomStateType object, which unfortunately mtrand.so does not export.
The only way to do it is in c++ code:
- import numpy.random
- get RandomState class
- call it to create RandomState instance
- get the ob_type pointer.
Pretty ugly:
object mod = object (handle<> (borrowed((PyImport_ImportModule("numpy.random"))))); object rs_obj = mod.attr("RandomState"); object rs_inst = call<object> (rs_obj.ptr(), 0); RandomStateTypeObj = rs_inst.ptr()->ob_type;
PyObject_IsInstance() should be sufficient.
http://docs.python.org/2/c-api/object.html#PyObject_IsInstance
-- Robert Kern
Thanks! For the record, an updated version attached.

On Tue, Mar 12, 2013 at 7:10 PM, Robert Kern robert.kern@gmail.com wrote:
On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker ndbecker2@gmail.com wrote:
Nathaniel Smith wrote:
On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith njs@pobox.com wrote:
On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern robert.kern@gmail.com wrote:
On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam siu@continuum.io wrote:
My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default.
numpy.random.set_state("MT19937_64", …) # choose the 64-bit
implementation
Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance.
+1
Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance.
And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347
I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work...
(To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.)
-n
What do you mean that the idea of global shared state is a bad one?
The words "global shared state" drives fear into the hearts of experienced programmers everywhere, whatever the context. :-) It's rarely a *good* idea.
How would you prefer the API to look?
There are two current APIs:
- Instantiate RandomState and call it's methods
- Just call the functions in numpy.random
The latter has a shared global state. In fact, all of those "functions" are just references to the methods on a shared global RandomState instance.
We advocate using the former API. Note that it already exists. It was the recommended API from day one. No one is recommending adding a new API.
I never saw much advertising for the RandomState api, and until recently wasn't sure why using the global random state function np.random.norm, ... should be a bad idea.
Learning by example, and seeing almost all examples using the global state, is not exactly conducive to figuring out that there is an issue.
All of scipy.stats.distribution random numbers are using the global random state. (I guess I should open a ticket.)
Josef
An alternative is a stateless rng, where you have to pass it it's state on each invocation, which it would update and return. I hope you're not advocating that.
No. This is a place where OOP solved the problem neatly.
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (5)
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Neal Becker
-
Robert Kern
-
Siu Kwan Lam