[Numpy-discussion] NEP: Random Number Generator Policy

Sun Jun 10 22:45:11 EDT 2018

On Sun, Jun 10, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 5:27 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >>>
> >>> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
> >>
> >> The problem is that we can't really have an ecosystem with two
> different general purpose systems.
> >
> > Can't = prefer not to.
>
> I meant what I wrote. :-)
>
> > But yes, that's true. That's not what I was saying though. We want one
> generic one, and one meant for unit testing only. You can achieve that in
> two ways:
> > 1. Change the current np.random API to new generic, and add a new
> RandomStable for unit tests.
> > 2. Add a new generic API, and document the current np.random API as
> being meant for unit tests only, for other usage <new API> should be
> preferred.
> >
> > (2) has a couple of pros:
> > - you're not forcing almost every library and end user out there to
> migrate their unit tests.
>
> But it has the cons that I talked about. RandomState *is* a fully
> functional general purpose PRNG system. After all, that's its current use.
> Documenting it as intended to be something else will not change that fact.
> Documentation alone provides no real impetus to move to the new system
> outside of the unit tests. And the community does need to move together to
> the new system in their library code, or else we won't be able to combine
> libraries together; these PRNG objects need to thread all the way through
> between code from different authors if we are to write programs with a
> controlled seed. The failure mode when people don't pay attention to the
> documentation is that I can no longer write programs that compose these
> libraries together. That's why I wrote "can't". It's not a mere preference
> for not having two systems to maintain. It has binary Go/No Go implications
> for building reproducible programs.
>

I don't understand this part.
For example, scipy.stats and scikit-learn allow the user to provide a
RandomState instance to the functions. I don't see why you want to force
down stream libraries to change this. A random state argument should be
(essentially) compatible with whatever the user uses, and there is no
reason to force packages to update there internal use like in unit tests if
they don't want to, e.g. because of the instability.

Aside to statsmodels: We currently have very few user facing random
functions, those are just in maybe 3 to 5 places where we have simulated or
bootstrap values.
Most of the other uses of np.random are in unit tests and some in the
documentation examples.

Josef

>
> > - more design freedom for the new generic API. The current one is
> clearly sub-optimal; in a new one you wouldn't have to expose all the
> global state/functions that np.random exposes now. You could even restrict
> it to a single class and put that in the main numpy namespace.
>
> I'm not sure why you are talking about the global state and np.random.*
> convenience functions. What we do with those functions is out of scope for
> this NEP and would be talked about it another NEP fully introducing the new
> system.
>
> >> To properly use pseudorandom numbers, I need to instantiate a PRNG and
> thread it through all of the code in my program: both the parts that I
> write and the third party libraries that I don't write.
> >>
> >> Generating test data for unit tests is separable, though. That's why I
> propose having a StableRandom built on the new architecture. Its purpose
> would be well-documented, and in my proposal is limited in features such
> that it will be less likely to be abused outside of that purpose. If you
> make it fully-featured, it is more likely to be abused by building library
> code around it. But even if it is so abused, because it is built on the new
> architecture, at least I can thread the same core PRNG state through the
> StableRandom distributions from the abusing library and use the better
> distributions class elsewhere (randomgen names it "Generator"). Just
> keeping RandomState around can't work like that because it doesn't have a
> replaceable core PRNG.
> >>
> >> But that does suggest another alternative that we should explore:
> >>
> >> The new architecture separates the core uniform PRNG from the wide
> variety of non-uniform probability distributions. That is, the core PRNG
> state is encapsulated in a discrete object that can be shared between
> instances of different distribution-providing classes. numpy.random should
> provide two such distribution-providing classes. The main one (let us call
> it ``Generator``, as it is called in the prototype) will follow the new
> policy: distribution methods can break the stream in feature releases.
> There will also be a secondary distributions class (let us call it
> ``LegacyGenerator``) which contains distribution methods exactly as they
> exist in the current ``RandomState`` implementation. When one combines
> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
> exact same stream as ``RandomState`` for all distribution methods. The
> ``LegacyGenerator`` methods will be forever frozen.
> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
> the MT19937 core PRNG, and whatever tricks needed to make
> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
> deprecated, becoming progressively noisier over a number of release cycles,
> in favor of explicitly instantiating ``LegacyGenerator``.
> >>
> >> ``LegacyGenerator`` CAN be used during this deprecation period in
> library and application code until libraries and applications can migrate
> to the new ``Generator``. Libraries and applications SHOULD migrate but
> MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test
> data for unit tests where cross-release stability of the streams is
> important. Test writers SHOULD consider ways to mitigate their reliance on
> such stability and SHOULD limit their usage to distribution methods that
> have fewer cross-platform stability risks.
>
> I would appreciate your consideration of this proposal. Does it address
> your concerns? It addresses my concerns with keeping around a
> fully-functional RandomState implementation.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/95eafc54/attachment.html>