[Numpy-discussion] NEP: Random Number Generator Policy

Robert Kern robert.kern at gmail.com
Sun Jun 3 21:08:38 EDT 2018

On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> The list of ``StableRandom`` methods should be chosen to support unit
>>> tests:
>>>     * ``.randint()``
>>>     * ``.uniform()``
>>>     * ``.normal()``
>>>     * ``.standard_normal()``
>>>     * ``.choice()``
>>>     * ``.shuffle()``
>>>     * ``.permutation()``
>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>> @bashtage writes:
>> > standard_gamma and standard_exponential are important enough to be
>> included here IMO.
>> "Importance" was not my criterion, only whether they are used in unit
>> test suites. This list was just off the top of my head for methods that I
>> think were actually used in test suites, so I'd be happy to be shown live
>> tests that use other methods. I'd like to be a *little* conservative about
>> what methods we stick in here, but we don't have to be *too* conservative,
>> since we are explicitly never going to be modifying these.
> That's one area where I thought the selection is too narrow.
> We should be able to get a stable stream from the uniform for some
> distributions.
> However, according to the Wikipedia description Poisson doesn't look easy.
> I just wrote a unit test for statsmodels using Poisson random numbers with
> hard coded numbers for the regression tests.

I'd really rather people do this than use StableRandom; this is best
practice, as I see it, if your tests involve making precise comparisons to
expected results.

StableRandom is intended as a crutch so that the pain of moving existing
unit tests away from the deprecated RandomState is less onerous. I'd really
rather people write better unit tests!

In particular, I do not want to add any of the integer-domain distributions
(aside from shuffle/permutation/choice) as these are the ones that have the
platform-dependency issues with respect to 32/64-bit `long` integers.
They'd be unreliable for unit tests even if we kept them stable over time.

> I'm not sure which other distributions are common enough and not easily
> reproducible by transformation. E.g. negative binomial can be reproduces by
> a gamma-poisson mixture.
> On the other hand normal can be easily recreated from standard_normal.

I was mostly motivated by making it a bit easier to mechanically replace
uses of randn(), which is probably even more common than normal() and
standard_normal() in unit tests.

> Would it be difficult to keep this list large, given that it should be
> frozen, low maintenance code ?

I admit that I had in mind non-statistical unit tests. That is, tests that
didn't depend on the precise distribution of the inputs.

Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/ab900a60/attachment.html>

More information about the NumPy-Discussion mailing list