[Numpy-discussion] stability of numpy.random.RandomState API?

Robert Kern robert.kern at gmail.com
Thu Nov 6 18:30:02 EST 2008


On Thu, Nov 6, 2008 at 16:58, Barry Wark <barrywark at gmail.com> wrote:
> On Thu, Nov 6, 2008 at 1:55 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Thu, Nov 6, 2008 at 15:12, Barry Wark <barrywark at gmail.com> wrote:
>>> On Thu, Nov 6, 2008 at 12:09 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>>> On Thu, Nov 6, 2008 at 14:05, Barry Wark <barrywark at gmail.com> wrote:
>>>>> I'm just about to embark on a long-term research project and was
>>>>> planning to use numpy.random to generate stimuli for our experiments.
>>>>> We plan to store only the parameters and RandomState seed for each
>>>>> stimulus and I'm concerned about stability of the API in the long
>>>>> term: will the parameters and random seed we store now work with
>>>>> future versions of numpy.random?
>>>>
>>>> It should. But just in case, make sure you explicitly instantiate
>>>> RandomState objects instead of using the functions in numpy.random.
>>>> That way, should we need to fix some bug that might change the
>>>> results, you can always pull out the current mtrand code and use it
>>>> independently.
>>>
>>> That is our working plan, as well as to record the numpy.__version__
>>> which was used to generate the original stimulus. Thanks for the
>>> confirmation.
>>>
>>> On a side note, this seems like a potentially big issue for many
>>> scientific users. Perhaps making a policy of keeping incompatible
>>> revisions to  RandomState noted in its documentation (if they ever
>>> come up) would be useful. Even better, a module function or class
>>> method that returns an instance of RandomState as it was at a
>>> particular numpy version:
>>>
>>> r = numpy.random.RandomState.from_version(my_numpy_version, seed=None)
>>>
>>> Hmm. Sounds like a bit of work. I'll give it a go, if you think this
>>> is a valuable approach.
>>>
>>>>
>>>>> I think I recall that there was a
>>>>> change in the random seed format some time around numpy 1.0.
>>>>
>>>> I don't think I changed it after 1.0. Before 1.0, we explicitly warned
>>>> people about API instability.
>>>
>>> I believe you. We've been developing this app since before numpy 1.0,
>>> so I'm sure the issue cropped up from data generated pre-1.0.
>>
>> Okay. Actually, now that I think about it, there have been changes
>> that would affect results using the nonuniform distributions. These
>> should only have arisen from fixing bugs (i.e. the previous results
>> were wrong, not just different). Do you have any thoughts on how you
>> would want us to handle that case?
>
> In our usage (neural physiology), we've recorded the physiological
> response to a given stimulus. So being able to recover the _exact_
> original stimulus that produced the recorded data is critical. This is
> why I suggested an API which would let us get an instance of the
> RandomState as it was at a particular revision (including bugs) so
> that we could regenerate the exact original sequence. Obviously, we're
> happy to have the bug fixes in, and continue to use the current
> RandomState for new experiments.

How big are these stimuli? I'd just store them in an HDF file. Some of
those bugs were 32-bit/64-bit differences. Just because your future
self can get the same-versioned source code doesn't mean that the
results you get will be identical. I think that the versioned API
would be a lot of work for a false sense of security.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list