[Numpy-discussion] Moving NumPy's PRNG Forward

Robert Kern robert.kern at gmail.com
Fri Jan 19 17:50:31 EST 2018


On Sat, Jan 20, 2018 at 2:57 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
>
> On Fri, Jan 19, 2018 at 6:57 AM Robert Kern <robert.kern at gmail.com> wrote:
>>
>> As an alternative, we may also want to leave `np.random.RandomState`
entirely fixed in place as deprecated legacy code that is never updated.
This would allow current unit tests that depend on the stream-compatibility
that we previously promised to still pass until they decide to update.
Development would move to a different class hierarchy with new names.
>
> I like this alternative, but I would hesitate to call it "deprecated".
Users who care about exact reproducibility across NumPy versions (e.g., for
testing) are probably less concerned about performance, and could continue
to use it.

I would be careful about that because quite a few of the methods are not
stable across platforms, even on the same numpy version. If you want to
declare that some part of the np.random API is stable for such purposes, we
need to curate a subset of the methods anyways. As a one-off thing, this
alternative proposes to declare that all of `np.random.RandomState` is
stable across versions, but we can't guarantee that all of it is
unconditionally stable for exact reproducibility. We can make a guarantee
for a smaller subset of methods, though. To your point, though, if we
freeze the current `RandomState`, we can make that guarantee for a larger
subset of the methods than we would for the new API. So I guess I talked
myself around to your view, but I would be a bit more cautious in how we
advertise the stability of the frozen `RandomState` API.

> New random number generator classes could implement their own guarantees
about compatibility across their methods.
>
>> I am personally not at all interested in preserving any stream
compatibility for the `numpy.random.*` aliases or letting the user swap out
the core PRNG for the global PRNG that underlies them. `np.random.seed()`
should be discouraged (if not outright deprecated) in favor of explicitly
passing around instances.
>
> I agree that np.random.seed() should be discouraged, but it feels very
late in NumPy's development to remove it.
>
> If we do alter the random number streams for numpy.random.*, it seems
that we should probably issue a warning (at least for a several major
versions) whenever numpy.random.seed() is called. This could get pretty
noisy. I guess that's all the more incentive to switch to random state
objects.

True. I like that.

The reason I think that it might be worth an exception is that it has been
a moral hazard. People aren't just writing correct but improvable code
(relying on `np.random.*` methods but seeding exactly once at the start of
their single-threaded simulation) but they've been writing incorrect and
easily-broken code. For example:

    np.random.seed(seed)
    np.random.shuffle(x_train)
    np.random.seed(seed)
    np.random.shuffle(labels_train)

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180120/dd2f0f21/attachment.html>


More information about the NumPy-Discussion mailing list