[Numpy-discussion] Moving NumPy's PRNG Forward
Peter Creasey
p.e.creasey.00 at googlemail.com
Fri Jan 19 16:18:40 EST 2018
> Date: Fri, 19 Jan 2018 23:55:57 +0900
> From: Robert Kern <robert.kern at gmail.com>
>
> tl;dr: I think that our stream-compatibility policy is holding us back, and
> I think we can come up with a way forward with a new policy that will allow
> us to innovate without seriously compromising our reliability.
>
> I propose that we adopt a similar policy. This would immediately resolve
> many of the issues blocking innovation in the random distributions.
> Improvements to the distributions could be made at the same rhythm as
> normal features. No version-selection API would be required as you select
> the version by installing the desired version of numpy. By default,
> everyone gets the latest, best versions of the sampling algorithms.
> Selecting a different core PRNG could be easily achieved as
> ng-numpy-randomstate does it, by instantiating different classes. The
> different incompatible ways to initialize different core PRNGs (with unique
> features like selectable streams and the like) are transparently handled:
> different classes have different constructors. There is no need to jam all
> options for all core PRNGs into a single constructor.
+1
I think I have a general comment that random streams are probably
over-used in testing, in particular:
1. If you need a small amount (100s) of random values you should just
spit them out to code/files for repeatability and,
2. If you need a large amount of random data then maybe you want to
rethink your testing strategy. In particular if you need millions of
points to hit a few edge cases then your code is significantly more
mature (low failure rate) than your test suite (inefficient), and you
could probably target your tests a bit more (I'm as guilty of this as
others).
Having said that I don't think NumPy today is in a position to fight
the inertia of 2 - numpy.random has given everyone easy tools to make
large and (nearly always) reproducible sequences, and they get used a
lot. On the other hand staying on top of performance requires an
active code-base, and for a reproducible PRNG almost any change is a
breaking-change. Allowing non-backwards-compatible streams
concurrently with the old style seems the logical way forward. So +1.
Best,
Peter
More information about the NumPy-Discussion
mailing list