[Numpy-discussion] Proposal: numpy.random.random_seed

Tue May 17 04:09:28 EDT 2016

On Tue, May 17, 2016 at 12:18 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
> > 1. When writing a library of stochastic functions that take a seed as an
> input argument, and some of these functions call multiple other such
> stochastic functions. Dask is one such example [1].
>
> Can you clarify the use case here? I don't really know what you are doing
> here, but I'm pretty sure this is not the right approach.
>

Here's a contrived example. Suppose I've written a simulator for cars that
consists of a number of loosely connected components (e.g., an engine,
brakes, etc.). The behavior of each component of our simulator is
stochastic, but we want everything to be fully reproducible, so we need to
use seeds or RandomState objects.

We might write our simulate_car function like the following:

def simulate_car(engine_config, brakes_config, seed=None):
    rs = np.random.RandomState(seed)
    engine = simulate_engine(engine_config, seed=rs.random_seed())
    brakes = simulate_brakes(brakes_config, seed=rs.random_seed())
    ...

The problem with passing the same RandomState object (either explicitly or
dropping the seed argument entirely and using the  global state) to both
simulate_engine and simulate_breaks is that it breaks encapsulation -- if I
change what I do inside simulate_engine, it also effects the brakes.

The dask use case is actually pretty different -- the intent is to create
many random numbers in parallel using multiple threads or processes
(possibly in a distributed fashion). I know that skipping ahead is the
standard way to get independent number streams for parallel sampling, but
that isn't exposed in numpy.random, and setting distinct seeds seems like a
reasonable alternative for scientific computing use cases.

> It's only pseudo-private. This is an authorized use of it.
>
> However, for this case, I usually just pass around the the numpy.random
> module itself and let duck-typing take care of the rest.
>

I like the duck-typing approach. That's very elegant.

If this is an authorized use of the global RandomState object, let's
document it! Otherwise cautious library maintainers like myself will
discourage using it :).

> > [3] On a side note, if there's no longer a good reason to keep this
> object private, perhaps we should expose it in our public API. It would
> certainly be useful -- scikit-learn is already using it (see links in the
> pandas PR above).
>
> Adding a public get_global_random_state() function might be in order.
>

Yes, possibly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160517/9fdd354f/attachment.html>