On Sun, May 24, 2015 at 10:22 AM, Antony Lee <antony.lee@berkeley.edu> wrote:
Hi,

As mentioned in

#1450: Patch with Ziggurat method for Normal distribution
#5158: ENH: More efficient algorithm for unweighted random choice without replacement
#5299: using `random.choice` to sample integers in a large range
#5851: Bug in np.random.dirichlet for small alpha parameters

some methods on np.random.RandomState are implemented either non-optimally (#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily changed due to backwards compatibility concerns.  While some have suggested new methods deprecating the old ones (see e.g. #5872), some consensus has formed around the following ideas (see #5299 for original discussion, followed by private discussions with @njsmith):

- Backwards compatibility should only be provided to those who were explicitly instantiating a seeded RandomState object or reseeding a RandomState object to a given value, and drawing variates from it: using the global methods (or a None-seeded RandomState) was already non-reproducible anyways as e.g. other libraries could be drawing variates from the global RandomState (of which the free functions in np.random are actually methods).  Thus, the global RandomState object should use the latest implementation of the methods.

The rest of the proposal looks good to me, but the reasoning on this point is shaky. np.random.seed() is *very* widely used, and works fine for a test suite where each test that needs random numbers calls seed(...) and is run with nose. Can you explain why you need to touch the behavior of the global methods in order to make RandomState(version=) work?

Ralf


- "RandomState(seed)" and "r = RandomState(...); r.seed(seed)" should offer backwards-compatibility guarantees (see e.g. https://docs.python.org/3.4/library/random.html#notes-on-reproducibility).

As such, we propose the following improvements to the API:

- RandomState gains a (keyword-only) parameter, "version", also accessible as a read-only attribute.  This indicates the version of the methods on the object.  The current version of RandomState is retroactively assigned version 0.  The latest available version is available as np.random.LATEST_VERSION.  Backwards-incompatible improvements to RandomState methods can be introduced but increase the LAGTEST_VERSION.

- The global RandomState is instantiated as RandomState(version=LATEST_VERSION).

- RandomState() and rs.seed() sets the version to LATEST_VERSION.

- RandomState(seed[!=None]) and rs.seed(seed[!=None]) sets the version to 0.

A proof-of-concept implementation, still missing tests, is tracked as #5911.  It includes the patch proposed in #5158 as an example of how to include an improved version of random.choice.

Comments, and help for writing tests (in particular to make sure backwards compatibility is maintained) are welcome.

Antony Lee

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion