On Thu, Nov 6, 2008 at 16:58, Barry Wark email@example.com wrote:
On Thu, Nov 6, 2008 at 1:55 PM, Robert Kern firstname.lastname@example.org wrote:
On Thu, Nov 6, 2008 at 15:12, Barry Wark email@example.com wrote:
On Thu, Nov 6, 2008 at 12:09 PM, Robert Kern firstname.lastname@example.org wrote:
On Thu, Nov 6, 2008 at 14:05, Barry Wark email@example.com wrote:
I'm just about to embark on a long-term research project and was planning to use numpy.random to generate stimuli for our experiments. We plan to store only the parameters and RandomState seed for each stimulus and I'm concerned about stability of the API in the long term: will the parameters and random seed we store now work with future versions of numpy.random?
It should. But just in case, make sure you explicitly instantiate RandomState objects instead of using the functions in numpy.random. That way, should we need to fix some bug that might change the results, you can always pull out the current mtrand code and use it independently.
That is our working plan, as well as to record the numpy.__version__ which was used to generate the original stimulus. Thanks for the confirmation.
On a side note, this seems like a potentially big issue for many scientific users. Perhaps making a policy of keeping incompatible revisions to RandomState noted in its documentation (if they ever come up) would be useful. Even better, a module function or class method that returns an instance of RandomState as it was at a particular numpy version:
r = numpy.random.RandomState.from_version(my_numpy_version, seed=None)
Hmm. Sounds like a bit of work. I'll give it a go, if you think this is a valuable approach.
I think I recall that there was a change in the random seed format some time around numpy 1.0.
I don't think I changed it after 1.0. Before 1.0, we explicitly warned people about API instability.
I believe you. We've been developing this app since before numpy 1.0, so I'm sure the issue cropped up from data generated pre-1.0.
Okay. Actually, now that I think about it, there have been changes that would affect results using the nonuniform distributions. These should only have arisen from fixing bugs (i.e. the previous results were wrong, not just different). Do you have any thoughts on how you would want us to handle that case?
In our usage (neural physiology), we've recorded the physiological response to a given stimulus. So being able to recover the _exact_ original stimulus that produced the recorded data is critical. This is why I suggested an API which would let us get an instance of the RandomState as it was at a particular revision (including bugs) so that we could regenerate the exact original sequence. Obviously, we're happy to have the bug fixes in, and continue to use the current RandomState for new experiments.
How big are these stimuli? I'd just store them in an HDF file. Some of those bugs were 32-bit/64-bit differences. Just because your future self can get the same-versioned source code doesn't mean that the results you get will be identical. I think that the versioned API would be a lot of work for a false sense of security.