[Numpy-discussion] NEP: Random Number Generator Policy

Robert Kern robert.kern at gmail.com
Sun Jun 3 20:21:56 EDT 2018


Moving some of the Github PR comments here:

Implementation
> --------------
>
> We propose first freezing ``RandomState`` as it is and developing a new RNG
> subsystem alongside it.  This allows anyone who has been relying on our old
> stream-compatibility guarantee to have plenty of time to migrate.
> ``RandomState`` will be considered deprecated, but with a long deprecation
> cycle, at least a few years.
>

https://github.com/numpy/numpy/pull/11229#discussion_r192604195
@bashtage writes:
> RandomState could pretty easily be spun out into a stand-alone package,
if useful. It is effectively a stand-alone submodule already.

Indeed. That would be a graceful forever-home for the code for anyone who
needs it. However, I'd still only make that switch after at least a few
years of deprecation inside numpy. And maybe a 2.0.0 release.


> Any new design for the RNG subsystem will provide a choice of different
> core
> uniform PRNG algorithms.  We will be more strict about a select subset of
> methods on these core PRNG objects.  They MUST guarantee
> stream-compatibility
> for a minimal, specified set of methods which are chosen to make it easier
> to
> compose them to build other distributions.  Namely,
>
>     * ``.bytes()``
>     * ``.random_uintegers()``
>
    * ``.random_sample()``
>

BTW, `random_uintegers()` is a new method in Kevin Sheppard's `randomgen`,
and I am referring to its semantics here.
https://github.com/bashtage/randomgen/blob/master/randomgen/generator.pyx#L191

https://github.com/numpy/numpy/pull/11229#discussion_r192604275
@bashtage writes:
> One of these (bytes, uintegers) seems redundant. uintegers should
probably by 64 bit.

Because different core generators have different "native" outputs (MT19937,
PCG32 output `uint32`s, PCG64 outputs `uint64`s, and some that I hope we
never implement natively output doubles), there are some simple, but
non-trivial choices to make to support each of these. I would like the core
generator's author to make those choices and maintain them. They're not
hard, but they are the kind of thing that ought to be decided once and
consistently.

I am of the opinion that `uintegers` should support at least `uint32` and
`uint64` as those are the most common native outputs among core generators.
There should be a maintained way to get that native format (and yes, I'd
rather have the user be explicit about it than have `random_native_uint()`
in addition to `random_uint64()`).

This argument extends to `.bytes()`, too, now that I think about it. A
stream of bytes is a native format for some generators, too, like if we
decide to hook up /dev/urandom or other file-backed interface.

Hmm, what do you think about adding `random_interval()` to this list? And
raising that up to the Python API level (a la what Python 3 did with
exposing `secrets.randbelow()` as a primitive)?
https://github.com/bashtage/randomgen/blob/master/randomgen/src/distributions/distributions.c#L1164-L1200

Many, many uses of this method would be with numbers much less than 1<<32
(e.g. Fisher-Yates shuffle), and for the 32-bit native PRNGs could mean
using half as many core PRNG draws if `random_interval()` is implemented
along with the core PRNG to make use of that fact.

The list of ``StableRandom`` methods should be chosen to support unit tests:
>
>     * ``.randint()``
>     * ``.uniform()``
>     * ``.normal()``
>     * ``.standard_normal()``
>     * ``.choice()``
>     * ``.shuffle()``
>     * ``.permutation()``
>

https://github.com/numpy/numpy/pull/11229#discussion_r192604311
@bashtage writes:
> standard_gamma and standard_exponential are important enough to be
included here IMO.

"Importance" was not my criterion, only whether they are used in unit test
suites. This list was just off the top of my head for methods that I think
were actually used in test suites, so I'd be happy to be shown live tests
that use other methods. I'd like to be a *little* conservative about what
methods we stick in here, but we don't have to be *too* conservative, since
we are explicitly never going to be modifying these.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/cd3ee699/attachment-0001.html>


More information about the NumPy-Discussion mailing list