[Numpy-discussion] skip samples in random number generator

Robert Kern robert.kern at gmail.com
Thu Oct 2 13:14:47 EDT 2014


On Thu, Oct 2, 2014 at 5:28 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On 2 Oct 2014 16:52, "Robert Kern" <robert.kern at gmail.com> wrote:
>>
>> On Thu, Oct 2, 2014 at 4:42 PM, Brad Buran <bburan at alum.mit.edu> wrote:
>> > Given the following:
>> >
>> > from numpy import random
>> > rs = random.RandomState(seed=1)
>> > # skip the first X billion samples
>> > x = rs.uniform(0, 10)
>> >
>> > How do I accomplish "skip the first X billion samples" (e.g. 7.2
>> > billion)?  I see that there's a numpy.random.RandomState.set_state
>> > which accepts (among other parameters) a value called "pos".  This
>> > sounds promising, but the other parameters I'm not sure how to compute
>> > (e.g. the 1D array of 624 unsigned integers, etc.).  I need to be able
>> > to skip ahead in the sequence to reproduce some signals that were
>> > generated for experiments.  I could certainly consume and discard the
>> > first X billion samples; however, that seems to be computationally
>> > inefficient.
>>
>> Unfortunately, it requires some significant number-theoretical
>> precomputation for any given N number of steps that you want to skip.
>>
>> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/JUMP/index.html
>
> If someone really wanted this functionality then I suppose it would be
> possible to precompute the special jump coefficients for lengths 2, 4, 8,
> 16, 32, ..., and then perform arbitrary jumps using a sequence of smaller
> jumps. (The coefficient table could be shipped with the source code.)

No one needs small jumps of arbitrary size. The real use case for
jumping is to make N parallel streams that won't overlap. You pick a
number, let's call it `jump_steps`, much larger than any single run of
your system could possibly consume (i.e. the number of core PRNG
variates pulled is << `jump_steps`). Then you can initializing N
parallel streams by initializing RandomState once with a seed, storing
that RandomState, then jumping ahead by `jump_steps`, storing *that*
RandomState, by `2*jump_steps`, etc. to get N RandomState streams that
will not overlap. Give those to your separate processes and let them
run.

So the alternative may actually be to just generate and distribute
*one* set of these jump coefficients for a really big jump size but
still leaves you enough space for a really large number of streams
(fortunately, 2**19937-1 is a really big number).

-- 
Robert Kern



More information about the NumPy-Discussion mailing list