[Numpy-discussion] Moving NumPy's PRNG Forward

Fri Jan 19 18:12:56 EST 2018

On Fri, Jan 19, 2018 at 6:55 AM, Robert Kern <robert.kern at gmail.com> wrote:
[...]
> There seems to be a lot of pent-up motivation to improve on the random
> number generation, in particular the distributions, that has been blocked by
> our policy. I think we've lost a few potential first-time contributors that
> have run up against this wall. We have been pondering ways to allow for
> adding new core PRNGs and improve the distribution methods while maintaining
> stream-compatibility for existing code. Kevin Sheppard, in particular, has
> been working hard to implement new core PRNGs with a common API.
>
>   https://github.com/bashtage/ng-numpy-randomstate
>
> Kevin has also been working to implement the several proposals that have
> been made to select different versions of distribution implementations. In
> particular, one idea is to pass something to the RandomState constructor to
> select a specific version of distributions (or switch out the core PRNG).
> Note that to satisfy the policy, the simplest method of seeding a
> RandomState will always give you the oldest version: what we have now.
>
> Kevin has recently come to the conclusion that it's not technically feasible
> to add the version-selection at all if we keep the stream-compatibility
> policy.
>
>   https://github.com/numpy/numpy/pull/10124#issuecomment-350876221
>
> I would argue that our current policy isn't providing the value that it
> claims to.

I agree that relaxing our policy would be better than the status quo.
Before making any decisions, though, I'd like to make sure we
understand the alternatives and their trade-offs. Specifically, I
think the main alternative would be the following approach to
versioning:

1) make RandomState's state be a tuple (underlying RNG algorithm,
underlying RNG state, distribution version)
2) zero-argument initialization/seeding, like RandomState() or
rstate.seed(), sets the state to: (our recommended RNG algorithm,
os.urandom(...), version=LATEST_VERSION)
3) for backcompat, single-argument seeding like RandomState(123) or
rstate.seed(123), sets the state to: (mersenne twister,
expand_mt_seed(123), version=0)
4) also allow seeding to explicitly control all the parameters, like
RandomState(PCG_XSL_RR(123), version=12) or whatever
5) the distribution functions are implemented like:

def normal(*args, **kwargs):
    if self.version < 3:
        return self._normal_box_muller(*args, **kwargs)
    elif self.version < 8:
        return self._normal_ziggurat_v1(*args, **kwargs)
    else:  # version >= 8
        return self._normal_ziggurat_v2(*args, **kwargs)

Advantages: fully backwards compatible; preserves the compatibility
guarantee (such as it is); users who use the default seeding
automatically get the highest speed and quality
Disadvantages: users who specify seeds explicitly get old/slow
distributions (but of course that's the point of compatibility); we
have to keep the old distribution code around forever (but this is not
too hard; it just sits in some function and we never touch it).

Kevin, is this the version that you think is non-viable? Is the above
a good description of the advantages/disadvantages?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org