[Numpy-discussion] NEP: Random Number Generator Policy

Mon Jun 11 01:06:11 EDT 2018

On Sun, Jun 10, 2018 at 7:46 PM <josef.pktd at gmail.com> wrote:
>
> On Sun, Jun 10, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>>
>> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>> >
>> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>> >>
>> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:
>> >>>
>> >>> It may be worth having a look at test suites for scipy, statsmodels,
scikit-learn, etc. and estimate how much work this NEP causes those
projects. If the devs of those packages are forced to do large scale
migrations from RandomState to StableState, then why not instead keep
RandomState and just add a new API next to it?
>> >>
>> >> The problem is that we can't really have an ecosystem with two
different general purpose systems.
>> >
>> > Can't = prefer not to.
>>
>> I meant what I wrote. :-)
>>
>> > But yes, that's true. That's not what I was saying though. We want one
generic one, and one meant for unit testing only. You can achieve that in
two ways:
>> > 1. Change the current np.random API to new generic, and add a new
RandomStable for unit tests.
>> > 2. Add a new generic API, and document the current np.random API as
being meant for unit tests only, for other usage <new API> should be
preferred.
>> >
>> > (2) has a couple of pros:
>> > - you're not forcing almost every library and end user out there to
migrate their unit tests.
>>
>> But it has the cons that I talked about. RandomState *is* a fully
functional general purpose PRNG system. After all, that's its current use.
Documenting it as intended to be something else will not change that fact.
Documentation alone provides no real impetus to move to the new system
outside of the unit tests. And the community does need to move together to
the new system in their library code, or else we won't be able to combine
libraries together; these PRNG objects need to thread all the way through
between code from different authors if we are to write programs with a
controlled seed. The failure mode when people don't pay attention to the
documentation is that I can no longer write programs that compose these
libraries together. That's why I wrote "can't". It's not a mere preference
for not having two systems to maintain. It has binary Go/No Go implications
for building reproducible programs.
>
> I don't understand this part.
> For example, scipy.stats and scikit-learn allow the user to provide a
RandomState instance to the functions. I don't see why you want to force
down stream libraries to change this. A random state argument should be
(essentially) compatible with whatever the user uses, and there is no
reason to force packages to update there internal use like in unit tests if
they don't want to, e.g. because of the instability.
>
> Aside to statsmodels: We currently have very few user facing random
functions, those are just in maybe 3 to 5 places where we have simulated or
bootstrap values.
> Most of the other uses of np.random are in unit tests and some in the
documentation examples.

Please consider my alternative proposal. Your feedback has convinced me
that that's a better approach than the StableRandom as laid out in the NEP.
I'm even willing to not deprecate the name RandomState.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/4f129d5b/attachment-0001.html>