[Numpy-discussion] Proposal: numpy.random.random_seed

Robert Kern robert.kern at gmail.com
Wed May 18 13:56:50 EDT 2016


On Wed, May 18, 2016 at 6:20 PM, <josef.pktd at gmail.com> wrote:
>
> On Wed, May 18, 2016 at 12:01 PM, Robert Kern <robert.kern at gmail.com>
wrote:
>>
>> On Wed, May 18, 2016 at 4:50 PM, Chris Barker <chris.barker at noaa.gov>
wrote:
>> >>
>> >> > ...anyway, the real reason I'm a bit grumpy is because there are
solid
>> >> > engineering reasons why users *want* this API,
>> >
>> > Honestly, I am lost in the math -- but like any good engineer, I want
to accomplish something anyway :-) I trust you guys to get this right -- or
at least document what's "wrong" with it.
>> >
>> > But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.
>> >
>> > At the moment we are using one RNG, with one seed for everything. So
we get reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.
>>
>> Well, the main question is: do you need to be able to spawn dependent
streams at arbitrary points to an arbitrary depth without coordination
between processes? The necessity for multiple independent streams per se is
not contentious.
>
> I'm similar to Chris, and didn't try to figure out the details of what
you are talking about.
>
> However, if there are functions getting into numpy that help in using a
best practice even if it's not bullet proof, then it's still better than
home made approaches.
> If it get's in soon, then we can use it in a few years (given dependency
lag). At that point there should be more distributed, nested simulation
based algorithms where we don't know in advance how far we have to go to
get reliable numbers or convergence.
>
> (But I don't see anything like that right now.)

Current best practice is to use PRNGs with settable streams (or fixed
jumpahead for those PRNGs cursed to not have settable streams but blessed
to have super-long periods). The way to get those into numpy is to help
Kevin Sheppard finish:

  https://github.com/bashtage/ng-numpy-randomstate

He's done nearly all of the hard work already.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160518/7ac24b98/attachment.html>


More information about the NumPy-Discussion mailing list