[scikit-learn] NEP: Random Number Generator Policy

josef.pktd at gmail.com josef.pktd at gmail.com
Sat Jun 16 08:54:36 EDT 2018


On Sat, Jun 16, 2018 at 3:59 AM, Robert Kern <robert.kern at gmail.com> wrote:
> I have made a significant revision. In this version, downstream projects
> like scikit-learn should experience significantly less forced churn.
>
> https://github.com/rkern/numpy/blob/nep/rng-clarification/doc/neps/nep-0019-rng-policy.rst
>
> https://mail.python.org/pipermail/numpy-discussion/2018-June/078252.html
>
> tl;dr RandomState lives! But its distributions are forever frozen. So maybe
> "undead" is more apt. Anyways, RandomState will continue to provide the same
> stream-compatibility that it always has. But it will be internally
> refactored to use the same core uniform PRNG objects that the new
> RandomGenerator distributions class will use underneath (defaulting to the
> current Mersenne Twister, of course). The distribution methods on
> RandomGenerator will be allowed to evolve with numpy versions and get
> better/faster implementations.
>
> Your code can mix the usage of RandomState and RandomGenerator as needed,
> but they can be made to share the same underlying RNG algorithm's state.


Sounds good to me, and I think handles all our concerns.

I also think that the issues behind the np.random.* section about the
global state and seed can be revisited for possible deprecation of
convenience features.

One clarifying question, mainly to see IIUC

in this quote
"""
Calling numpy.random.seed() thereafter SHOULD just pass the given seed
to the current basic RNG object and not attempt to reset the basic RNG
to the Mersenne Twister. The global RandomState instance MUST be
accessible by the name numpy.random.mtrand._rand
"""

"the current basic RNG object" refers to the global object. AFAIU, it
is possible to change it numpy.random.mtrand._rand. Is it?

I never tried that so I didn't know we can change the global
RandomState, and thought we will have to replace np.random.seed usage
with a specific RandomState(seed) instance


In loose analogy:

Matplotlib has a "global" current figure and axis, gca, gcf.
In statsmodels we avoid any access to and usage of it and only work
with individual figure/axis instances that can be provided by the
user. (except for maybe some documentation examples and maybe some
"legacy" code.)
( https://github.com/statsmodels/statsmodels/blob/master/statsmodels/graphics/utils.py#L48
)

AFAICS, essentially, statsmodels will need a similar policy for
RandomState/RandomGenerator and give up the usage of the global random
instance.

Josef

>
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>  that is made terrible by our own mad attempt to interpret it as though it
> had
>  an underlying truth."
>   -- Umberto Eco
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


More information about the scikit-learn mailing list