[Numpy-discussion] [Scikit-learn-general] random number generator, entropy and pickling

Robert Kern robert.kern at gmail.com
Mon Apr 25 13:49:50 EDT 2011

On Mon, Apr 25, 2011 at 11:57, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> Hi there,
> We are courrently having a discussion on the scikits learn mailing list
> about which patterns to adopt for random number generation. One thing
> that is absolutely clear is that making the stream of random numbers
> reproducible is critical. We have several objects that can serve as random
> variate generators. So far, we instanciate these objects with a optional
> seed or PRNG argument, as in:
>    def __init__(self, prng=None):
>        if prng is None:
>            prng = np.random
>        self.prng = prng
> The problem with this pattern is that np.random doesn't pickle, and
> therefore the objects do not pickle by default. A bit of pickling magic
> would solve this, but we'd rather avoid it.
> We thought that we could simply have a PRNG per object, as in:
>    def __init__(self, prng=None):
>        if prng is None:
>            prng = np.random.RandomState()
>        self.prng = prng
> I don't like this option, because it means that with a given pieve of
> code, setting the seed of numpy's PRNG isn't enough to make it
> reproducible.
> I couldn't retrieve a handle on a picklable instance for the global PRNG.

It's accessible as np.random.mtrand._rand, though we have kept it
"private" for a reason. The Option (a) from the original thread on
scikits-learn-general, "use your own default global RandomState
instance in scikits.learn", would be preferable.

> The only option I can see would be to use the global numpy PRNG to seed
> an instance specific RandomState, as in:
>    def __init__(self, prng=None):
>        if prng is None:
>            prng = np.random.RandomState(np.random.random())
>        self.prng = prng
> That way seeding the global PRNG really does control the full random
> number generation. I am wondering if it would have an adverse consequence
> on the entropy of the stream of random numbers. Does anybody have
> suggestions? Advices?

Use a single, common default PRNG, either np.random.mtrand._rand or
your own. Don't use multiple seeds from a PRNG. Use a utility function
to avoid repeating yourself, even if it's just a one-liner. In this
case, it's important that everyone do exactly the same thing for
consistency, both inside scikits.learn and in code that uses or
extends scikits.learn. The best way to ensure that is to provide a
utility function as the One, Obvious Way To Do It. Note that if you do
hide the details behind a utility function, I would remove my
objection to using np.random.mtrand._rand. ;-)

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the NumPy-Discussion mailing list