numpy performance and random numbers

Sat Dec 19 19:55:48 EST 2009

On Sat, 19 Dec 2009 13:58:37 -0800, sturlamolden wrote:

> On 19 Des, 21:26, Lie Ryan <lie.1... at gmail.com> wrote:
> 
>> you can just start two PRNG at two distinct states
> 
> No. You have to know for certain that the outputs don't overlap.

"For certain"? Why?

Presumably you never do a Monte Carlo simulation once, you always repeat 
the simulation. Does it matter if (say) one trial in fifty is lower-
quality than the rest because the outputs happened to overlap? What if 
it's one trial in fifty thousand? So long as the probability of overlap 
is sufficiently small, you might not even care about doing repeated 
simulations.

> If you pick two random states (using any PRNG), you need error- checking
> that states are always unique, i.e. that each PRNG never reaches the
> starting state of the other(s). If I had to do this, I would e.g. hash
> the state to a 32 bit integer. You can deal with errors by jumping to a
> different state or simply aborting the simulation. The problem is that
> the book-keeping will be costly compared to the work involved in
> generating a single pseudorandom deviate. So while we can construct a
> parallel PRNG this way, it will likely run slower than one unique PRNG.

Since truly random sequences sometimes produce repeating sequences, you 
should be able to tolerate short periods of overlap. I'd consider the 
following strategy: for each of your parallel PRNGs other than the first, 
periodically jump to another state unconditionally. The periods should be 
relatively prime to each other, e.g. the second PRNG jumps-ahead after 
(say) 51 calls, the third after 37 calls, etc. (the exact periods 
shouldn't be important). Use a second, cheap, PRNG to specify how far to 
jump ahead. The overhead should be quite small: a simple counter and test 
for each PRNG, plus a small number of calls to a cheap PRNG, and 
statistically you should expect no significant overlap.

Is this workable?

-- 
Steven