[Numpy-discussion] NEP: Random Number Generator Policy

Ralf Gommers ralf.gommers at gmail.com
Sun Jun 10 20:26:35 EDT 2018

On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>> It may be worth having a look at test suites for scipy, statsmodels,
>> scikit-learn, etc. and estimate how much work this NEP causes those
>> projects. If the devs of those packages are forced to do large scale
>> migrations from RandomState to StableState, then why not instead keep
>> RandomState and just add a new API next to it?
> The problem is that we can't really have an ecosystem with two different
> general purpose systems.

Can't = prefer not to. But yes, that's true. That's not what I was saying
though. We want one generic one, and one meant for unit testing only. You
can achieve that in two ways:
1. Change the current np.random API to new generic, and add a new
RandomStable for unit tests.
2. Add a new generic API, and document the current np.random API as being
meant for unit tests only, for other usage <new API> should be preferred.

(2) has a couple of pros:
- you're not forcing almost every library and end user out there to migrate
their unit tests.
- more design freedom for the new generic API. The current one is clearly
sub-optimal; in a new one you wouldn't have to expose all the global
state/functions that np.random exposes now. You could even restrict it to a
single class and put that in the main numpy namespace.


To properly use pseudorandom numbers, I need to instantiate a PRNG and
> thread it through all of the code in my program: both the parts that I
> write and the third party libraries that I don't write.
> Generating test data for unit tests is separable, though. That's why I
> propose having a StableRandom built on the new architecture. Its purpose
> would be well-documented, and in my proposal is limited in features such
> that it will be less likely to be abused outside of that purpose. If you
> make it fully-featured, it is more likely to be abused by building library
> code around it. But even if it is so abused, because it is built on the new
> architecture, at least I can thread the same core PRNG state through the
> StableRandom distributions from the abusing library and use the better
> distributions class elsewhere (randomgen names it "Generator"). Just
> keeping RandomState around can't work like that because it doesn't have a
> replaceable core PRNG.
> But that does suggest another alternative that we should explore:
> The new architecture separates the core uniform PRNG from the wide variety
> of non-uniform probability distributions. That is, the core PRNG state is
> encapsulated in a discrete object that can be shared between instances of
> different distribution-providing classes. numpy.random should provide two
> such distribution-providing classes. The main one (let us call it
> ``Generator``, as it is called in the prototype) will follow the new
> policy: distribution methods can break the stream in feature releases.
> There will also be a secondary distributions class (let us call it
> ``LegacyGenerator``) which contains distribution methods exactly as they
> exist in the current ``RandomState`` implementation. When one combines
> ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
> exact same stream as ``RandomState`` for all distribution methods. The
> ``LegacyGenerator`` methods will be forever frozen.
> ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
> the MT19937 core PRNG, and whatever tricks needed to make
> ``isinstance(prng, RandomState)`` and unpickling work should be done. This
> way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
> deprecated, becoming progressively noisier over a number of release cycles,
> in favor of explicitly instantiating ``LegacyGenerator``.
> ``LegacyGenerator`` CAN be used during this deprecation period in library
> and application code until libraries and applications can migrate to the
> new ``Generator``. Libraries and applications SHOULD migrate but MUST NOT
> be forced to. ``LegacyGenerator`` CAN be used to generate test data for
> unit tests where cross-release stability of the streams is important. Test
> writers SHOULD consider ways to mitigate their reliance on such stability
> and SHOULD limit their usage to distribution methods that have fewer
> cross-platform stability risks.
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/5a5a9567/attachment.html>

More information about the NumPy-Discussion mailing list