stability of numpy.random.RandomState API?
I'm just about to embark on a long-term research project and was planning to use numpy.random to generate stimuli for our experiments. We plan to store only the parameters and RandomState seed for each stimulus and I'm concerned about stability of the API in the long term: will the parameters and random seed we store now work with future versions of numpy.random? I think I recall that there was a change in the random seed format some time around numpy 1.0. Thanks, Barry
On Thu, Nov 6, 2008 at 14:05, Barry Wark <barrywark@gmail.com> wrote:
I'm just about to embark on a long-term research project and was planning to use numpy.random to generate stimuli for our experiments. We plan to store only the parameters and RandomState seed for each stimulus and I'm concerned about stability of the API in the long term: will the parameters and random seed we store now work with future versions of numpy.random?
It should. But just in case, make sure you explicitly instantiate RandomState objects instead of using the functions in numpy.random. That way, should we need to fix some bug that might change the results, you can always pull out the current mtrand code and use it independently.
I think I recall that there was a change in the random seed format some time around numpy 1.0.
I don't think I changed it after 1.0. Before 1.0, we explicitly warned people about API instability. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, Nov 6, 2008 at 12:09 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Nov 6, 2008 at 14:05, Barry Wark <barrywark@gmail.com> wrote:
I'm just about to embark on a long-term research project and was planning to use numpy.random to generate stimuli for our experiments. We plan to store only the parameters and RandomState seed for each stimulus and I'm concerned about stability of the API in the long term: will the parameters and random seed we store now work with future versions of numpy.random?
It should. But just in case, make sure you explicitly instantiate RandomState objects instead of using the functions in numpy.random. That way, should we need to fix some bug that might change the results, you can always pull out the current mtrand code and use it independently.
That is our working plan, as well as to record the numpy.__version__ which was used to generate the original stimulus. Thanks for the confirmation. On a side note, this seems like a potentially big issue for many scientific users. Perhaps making a policy of keeping incompatible revisions to RandomState noted in its documentation (if they ever come up) would be useful. Even better, a module function or class method that returns an instance of RandomState as it was at a particular numpy version: r = numpy.random.RandomState.from_version(my_numpy_version, seed=None) Hmm. Sounds like a bit of work. I'll give it a go, if you think this is a valuable approach.
I think I recall that there was a change in the random seed format some time around numpy 1.0.
I don't think I changed it after 1.0. Before 1.0, we explicitly warned people about API instability.
I believe you. We've been developing this app since before numpy 1.0, so I'm sure the issue cropped up from data generated pre-1.0.
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Nov 6, 2008 at 15:12, Barry Wark <barrywark@gmail.com> wrote:
On Thu, Nov 6, 2008 at 12:09 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Nov 6, 2008 at 14:05, Barry Wark <barrywark@gmail.com> wrote:
I'm just about to embark on a long-term research project and was planning to use numpy.random to generate stimuli for our experiments. We plan to store only the parameters and RandomState seed for each stimulus and I'm concerned about stability of the API in the long term: will the parameters and random seed we store now work with future versions of numpy.random?
It should. But just in case, make sure you explicitly instantiate RandomState objects instead of using the functions in numpy.random. That way, should we need to fix some bug that might change the results, you can always pull out the current mtrand code and use it independently.
That is our working plan, as well as to record the numpy.__version__ which was used to generate the original stimulus. Thanks for the confirmation.
On a side note, this seems like a potentially big issue for many scientific users. Perhaps making a policy of keeping incompatible revisions to RandomState noted in its documentation (if they ever come up) would be useful. Even better, a module function or class method that returns an instance of RandomState as it was at a particular numpy version:
r = numpy.random.RandomState.from_version(my_numpy_version, seed=None)
Hmm. Sounds like a bit of work. I'll give it a go, if you think this is a valuable approach.
I think I recall that there was a change in the random seed format some time around numpy 1.0.
I don't think I changed it after 1.0. Before 1.0, we explicitly warned people about API instability.
I believe you. We've been developing this app since before numpy 1.0, so I'm sure the issue cropped up from data generated pre-1.0.
Okay. Actually, now that I think about it, there have been changes that would affect results using the nonuniform distributions. These should only have arisen from fixing bugs (i.e. the previous results were wrong, not just different). Do you have any thoughts on how you would want us to handle that case? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, Nov 6, 2008 at 1:55 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Nov 6, 2008 at 15:12, Barry Wark <barrywark@gmail.com> wrote:
On Thu, Nov 6, 2008 at 12:09 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Nov 6, 2008 at 14:05, Barry Wark <barrywark@gmail.com> wrote:
I'm just about to embark on a long-term research project and was planning to use numpy.random to generate stimuli for our experiments. We plan to store only the parameters and RandomState seed for each stimulus and I'm concerned about stability of the API in the long term: will the parameters and random seed we store now work with future versions of numpy.random?
It should. But just in case, make sure you explicitly instantiate RandomState objects instead of using the functions in numpy.random. That way, should we need to fix some bug that might change the results, you can always pull out the current mtrand code and use it independently.
That is our working plan, as well as to record the numpy.__version__ which was used to generate the original stimulus. Thanks for the confirmation.
On a side note, this seems like a potentially big issue for many scientific users. Perhaps making a policy of keeping incompatible revisions to RandomState noted in its documentation (if they ever come up) would be useful. Even better, a module function or class method that returns an instance of RandomState as it was at a particular numpy version:
r = numpy.random.RandomState.from_version(my_numpy_version, seed=None)
Hmm. Sounds like a bit of work. I'll give it a go, if you think this is a valuable approach.
I think I recall that there was a change in the random seed format some time around numpy 1.0.
I don't think I changed it after 1.0. Before 1.0, we explicitly warned people about API instability.
I believe you. We've been developing this app since before numpy 1.0, so I'm sure the issue cropped up from data generated pre-1.0.
Okay. Actually, now that I think about it, there have been changes that would affect results using the nonuniform distributions. These should only have arisen from fixing bugs (i.e. the previous results were wrong, not just different). Do you have any thoughts on how you would want us to handle that case?
In our usage (neural physiology), we've recorded the physiological response to a given stimulus. So being able to recover the _exact_ original stimulus that produced the recorded data is critical. This is why I suggested an API which would let us get an instance of the RandomState as it was at a particular revision (including bugs) so that we could regenerate the exact original sequence. Obviously, we're happy to have the bug fixes in, and continue to use the current RandomState for new experiments.
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Nov 6, 2008 at 16:58, Barry Wark <barrywark@gmail.com> wrote:
On Thu, Nov 6, 2008 at 1:55 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Nov 6, 2008 at 15:12, Barry Wark <barrywark@gmail.com> wrote:
On Thu, Nov 6, 2008 at 12:09 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, Nov 6, 2008 at 14:05, Barry Wark <barrywark@gmail.com> wrote:
I'm just about to embark on a long-term research project and was planning to use numpy.random to generate stimuli for our experiments. We plan to store only the parameters and RandomState seed for each stimulus and I'm concerned about stability of the API in the long term: will the parameters and random seed we store now work with future versions of numpy.random?
It should. But just in case, make sure you explicitly instantiate RandomState objects instead of using the functions in numpy.random. That way, should we need to fix some bug that might change the results, you can always pull out the current mtrand code and use it independently.
That is our working plan, as well as to record the numpy.__version__ which was used to generate the original stimulus. Thanks for the confirmation.
On a side note, this seems like a potentially big issue for many scientific users. Perhaps making a policy of keeping incompatible revisions to RandomState noted in its documentation (if they ever come up) would be useful. Even better, a module function or class method that returns an instance of RandomState as it was at a particular numpy version:
r = numpy.random.RandomState.from_version(my_numpy_version, seed=None)
Hmm. Sounds like a bit of work. I'll give it a go, if you think this is a valuable approach.
I think I recall that there was a change in the random seed format some time around numpy 1.0.
I don't think I changed it after 1.0. Before 1.0, we explicitly warned people about API instability.
I believe you. We've been developing this app since before numpy 1.0, so I'm sure the issue cropped up from data generated pre-1.0.
Okay. Actually, now that I think about it, there have been changes that would affect results using the nonuniform distributions. These should only have arisen from fixing bugs (i.e. the previous results were wrong, not just different). Do you have any thoughts on how you would want us to handle that case?
In our usage (neural physiology), we've recorded the physiological response to a given stimulus. So being able to recover the _exact_ original stimulus that produced the recorded data is critical. This is why I suggested an API which would let us get an instance of the RandomState as it was at a particular revision (including bugs) so that we could regenerate the exact original sequence. Obviously, we're happy to have the bug fixes in, and continue to use the current RandomState for new experiments.
How big are these stimuli? I'd just store them in an HDF file. Some of those bugs were 32-bit/64-bit differences. Just because your future self can get the same-versioned source code doesn't mean that the results you get will be identical. I think that the versioned API would be a lot of work for a false sense of security. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Barry Wark wrote:
In our usage (neural physiology), we've recorded the physiological response to a given stimulus. So being able to recover the _exact_ original stimulus that produced the recorded data is critical.
I'd be inclined to say that if you really want the exact same string of psuedo-random numbers years into the future, you'd probably be safest to generate a huge string now and save it, rather than expecting a future version of a library, perhaps running on different hardware, etc, to re-generate the exact same thing in the future. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (3)
-
Barry Wark
-
Christopher Barker
-
Robert Kern