<div dir="ltr">On Sat, Jan 20, 2018 at 2:57 AM, Stephan Hoyer <<a href="mailto:shoyer@gmail.com">shoyer@gmail.com</a>> wrote:<br>><br>> On Fri, Jan 19, 2018 at 6:57 AM Robert Kern <<a href="mailto:robert.kern@gmail.com">robert.kern@gmail.com</a>> wrote:<br>>><br>>> As an alternative, we may also want to leave `np.random.RandomState` entirely fixed in place as deprecated legacy code that is never updated. This would allow current unit tests that depend on the stream-compatibility that we previously promised to still pass until they decide to update. Development would move to a different class hierarchy with new names.<br>><br>> I like this alternative, but I would hesitate to call it "deprecated". Users who care about exact reproducibility across NumPy versions (e.g., for testing) are probably less concerned about performance, and could continue to use it.<br><br>I would be careful about that because quite a few of the methods are not stable across platforms, even on the same numpy version. If you want to declare that some part of the np.random API is stable for such purposes, we need to curate a subset of the methods anyways. As a one-off thing, this alternative proposes to declare that all of `np.random.RandomState` is stable across versions, but we can't guarantee that all of it is unconditionally stable for exact reproducibility. We can make a guarantee for a smaller subset of methods, though. To your point, though, if we freeze the current `RandomState`, we can make that guarantee for a larger subset of the methods than we would for the new API. So I guess I talked myself around to your view, but I would be a bit more cautious in how we advertise the stability of the frozen `RandomState` API.<br><br>> New random number generator classes could implement their own guarantees about compatibility across their methods.<br>><br>>> I am personally not at all interested in preserving any stream compatibility for the `numpy.random.*` aliases or letting the user swap out the core PRNG for the global PRNG that underlies them. `np.random.seed()` should be discouraged (if not outright deprecated) in favor of explicitly passing around instances.<br>><br>> I agree that np.random.seed() should be discouraged, but it feels very late in NumPy's development to remove it.<br>><br>> If we do alter the random number streams for numpy.random.*, it seems that we should probably issue a warning (at least for a several major versions) whenever numpy.random.seed() is called. This could get pretty noisy. I guess that's all the more incentive to switch to random state objects.<br><br>True. I like that.<div><br></div><div>The reason I think that it might be worth an exception is that it has been a moral hazard. People aren't just writing correct but improvable code (relying on `np.random.*` methods but seeding exactly once at the start of their single-threaded simulation) but they've been writing incorrect and easily-broken code. For example:</div><div><br></div><div><div>    np.random.seed(seed)</div><div>    np.random.shuffle(x_train)</div><div>    np.random.seed(seed)</div><div>    np.random.shuffle(labels_train)</div><div><br></div>--<br>Robert Kern</div></div>