[Numpy-discussion] NEP: Random Number Generator Policy

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jun 4 08:29:26 EDT 2018


On Mon, Jun 4, 2018 at 2:22 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 3, 2018 at 10:27 PM <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Mon, Jun 4, 2018 at 12:53 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>> It may be worth having a look at test suites for scipy, statsmodels,
>>>> scikit-learn, etc. and estimate how much work this NEP causes those
>>>> projects. If the devs of those packages are forced to do large scale
>>>> migrations from RandomState to StableState, then why not instead keep
>>>> RandomState and just add a new API next to it?
>>>>
>>>
>>> Tests that explicitly create RandomState objects would not be difficult
>>> to migrate. The goal of "StableState" is that it could be used directly in
>>> cases where RandomState is current used in tests, so I would guess that
>>> "RandomState" could be almost mechanistically replaced by "StableState".
>>>
>>> The challenging case are calls to np.random.seed(). If no replacement
>>> API is planned, then these would need to be manually converted to use
>>> StableState instead. This is probably not too onerous (and is a good
>>> cleanup to do anyways) but it would be a bit of work.
>>>
>>
>> I agree with this. Statsmodels uses mostly np.random.seed. That cleanup
>> is planned, but postponed so far as not high priority. We will have to do
>> it eventually.
>>
>> The main work will come when StableState doesn't include specific
>> distribution, Poisson, NegativeBinomial, Gamma, ... and distributions that
>> we don't even use yet, like Beta.
>>
>
> I would posit that it is probably very rare that one uses the full breadth
> of distributions in unit tests. You may be the only one. :-)
>

Given that I'm one of the maintainers for Statistics in Python, I wouldn't
be surprised if I would use more than almost all others.
However, statsmodels doesn't use a very large set, there are other packages
that use Pareto and Extreme Value distributions or circular distributions
like vonmises which are not yet in statsmodels. I have no idea about
whether MCMC packages still rely on numpy.random.

But the main "user" of numpy's random is scipy.stats which might be using
almost all of the distributions. I don't have a current overview about how
much scipy.stats unit tests rely on having stable streams for the available
distributions.



>
>
>> I don't want to migrate random number generation for the distributions
>> abandoned by numpy Stable to statsmodels.
>>
>
> What if we followed Kevin's suggestion and forked off RandomState into its
> own forever-frozen package sooner rather than later? It's intended use
> would be for people with legacy packages that cannot upgrade (other than
> changing some imports) and for unit tests that require precise streams for
> a full breadth of distributions. We would still leave it in numpy.random
> for a deprecation period, but maybe we would be noisy about it sooner and
> remove it sooner than my NEP planned for.
>
> Would that work? I'd be happy to maintain that forked-RandomState for you.
>

It would not be nice to have to add another dependency, but that would work
for statsmodels.

I'm not sure whether scipy.stats maintainers are fine with it. Given that
scipy already uses RandomState instead of the global instance, the actual
change if distributions are available would be to swap a StableState for a
RandomState in the unit tests, AFAIK.

Josef



>
> I would probably still encourage most people to continue to use
> StableRandom for most unit testing.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/ac093e8c/attachment.html>


More information about the NumPy-Discussion mailing list