[Numpy-discussion] NEP: Random Number Generator Policy

Sun Jun 3 20:45:31 EDT 2018

On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com> wrote:

> Moving some of the Github PR comments here:
>
> Implementation
>> --------------
>>
>> We propose first freezing ``RandomState`` as it is and developing a new
>> RNG
>> subsystem alongside it.  This allows anyone who has been relying on our
>> old
>> stream-compatibility guarantee to have plenty of time to migrate.
>> ``RandomState`` will be considered deprecated, but with a long deprecation
>> cycle, at least a few years.
>>
>
> https://github.com/numpy/numpy/pull/11229#discussion_r192604195
> @bashtage writes:
> > RandomState could pretty easily be spun out into a stand-alone package,
> if useful. It is effectively a stand-alone submodule already.
>
> Indeed. That would be a graceful forever-home for the code for anyone who
> needs it. However, I'd still only make that switch after at least a few
> years of deprecation inside numpy. And maybe a 2.0.0 release.
>
>
>> Any new design for the RNG subsystem will provide a choice of different
>> core
>> uniform PRNG algorithms.  We will be more strict about a select subset of
>> methods on these core PRNG objects.  They MUST guarantee
>> stream-compatibility
>> for a minimal, specified set of methods which are chosen to make it
>> easier to
>> compose them to build other distributions.  Namely,
>>
>>     * ``.bytes()``
>>     * ``.random_uintegers()``
>>
>     * ``.random_sample()``
>>
>
> BTW, `random_uintegers()` is a new method in Kevin Sheppard's `randomgen`,
> and I am referring to its semantics here.
> https://github.com/bashtage/randomgen/blob/master/
> randomgen/generator.pyx#L191
>
> https://github.com/numpy/numpy/pull/11229#discussion_r192604275
> @bashtage writes:
> > One of these (bytes, uintegers) seems redundant. uintegers should
> probably by 64 bit.
>
> Because different core generators have different "native" outputs
> (MT19937, PCG32 output `uint32`s, PCG64 outputs `uint64`s, and some that I
> hope we never implement natively output doubles), there are some simple,
> but non-trivial choices to make to support each of these. I would like the
> core generator's author to make those choices and maintain them. They're
> not hard, but they are the kind of thing that ought to be decided once and
> consistently.
>
> I am of the opinion that `uintegers` should support at least `uint32` and
> `uint64` as those are the most common native outputs among core generators.
> There should be a maintained way to get that native format (and yes, I'd
> rather have the user be explicit about it than have `random_native_uint()`
> in addition to `random_uint64()`).
>
> This argument extends to `.bytes()`, too, now that I think about it. A
> stream of bytes is a native format for some generators, too, like if we
> decide to hook up /dev/urandom or other file-backed interface.
>
> Hmm, what do you think about adding `random_interval()` to this list? And
> raising that up to the Python API level (a la what Python 3 did with
> exposing `secrets.randbelow()` as a primitive)?
> https://github.com/bashtage/randomgen/blob/master/
> randomgen/src/distributions/distributions.c#L1164-L1200
>
> Many, many uses of this method would be with numbers much less than 1<<32
> (e.g. Fisher-Yates shuffle), and for the 32-bit native PRNGs could mean
> using half as many core PRNG draws if `random_interval()` is implemented
> along with the core PRNG to make use of that fact.
>
> The list of ``StableRandom`` methods should be chosen to support unit
>> tests:
>>
>>     * ``.randint()``
>>     * ``.uniform()``
>>     * ``.normal()``
>>     * ``.standard_normal()``
>>     * ``.choice()``
>>     * ``.shuffle()``
>>     * ``.permutation()``
>>
>
> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
> @bashtage writes:
> > standard_gamma and standard_exponential are important enough to be
> included here IMO.
>
> "Importance" was not my criterion, only whether they are used in unit test
> suites. This list was just off the top of my head for methods that I think
> were actually used in test suites, so I'd be happy to be shown live tests
> that use other methods. I'd like to be a *little* conservative about what
> methods we stick in here, but we don't have to be *too* conservative, since
> we are explicitly never going to be modifying these.
>

That's one area where I thought the selection is too narrow.
We should be able to get a stable stream from the uniform for some
distributions.

However, according to the Wikipedia description Poisson doesn't look easy.
I just wrote a unit test for statsmodels using Poisson random numbers with
hard coded numbers for the regression tests.
I'm not sure which other distributions are common enough and not easily
reproducible by transformation. E.g. negative binomial can be reproduces by
a gamma-poisson mixture.

On the other hand normal can be easily recreated from standard_normal.

Would it be difficult to keep this list large, given that it should be
frozen, low maintenance code ?

Josef

>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/2a7437f3/attachment-0001.html>