[Numpy-discussion] random seed replicate 2d randn with 1d loop

josef.pktd at gmail.com josef.pktd at gmail.com
Mon May 23 15:47:59 EDT 2011


On Mon, May 23, 2011 at 3:27 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 05/23/2011 02:02 PM, Robert Kern wrote:
>> On Mon, May 23, 2011 at 13:33,<josef.pktd at gmail.com>  wrote:
>>> I have a function in two versions, one vectorized, one with loop
>>>
>>> the vectorized function  gets all randn variables in one big array
>>> rvs = distr.rvs(args, **{'size':(nobs, nrep)})
>>>
>>> the looping version has:
>>>     for irep in xrange(nrep):
>>>         rvs = distr.rvs(args, **{'size':nobs})
>>>
>>> the rest should be identical (except for vectorization
> What happened to your 'irep' and 'nrep' variables in the vectorized and
> looping versions, respectively?
> The looping version is overwriting 'rvs' but the vectorized version is
> not unless you are accumulating it elsewhere. (If so then there is
> another difference..).

Yes in the loop case, I'm accumulating only what I need and throw away
the random numbers immediately.
In the vectorized version, I build and use the entire (nrep, nobs) =
(10000, 200) or (10000, 1000) array at once, which is faster than the
loop.

> Also, I try to avoid having variables the same name as a function just
> in case (old habits die hard).

rvs is a method, I never have a function called rvs, and it's one of
my favorite names for an array of random variables during testing.

>
>>> Is there a guarantee that the 2d arrays are filled up in a specific
>>> order so that the loop and vectorized version produce the same result,
>>> given the same seed?
>> No general guarantee for all of the scipy distributions, no. I suspect
>> that all of the RandomState methods do work this way, though.
>>
> You have to guarantee that the complete stream of random numbers was not
> interrupted such as an addition calls or reset between loops. That
> probably means to generate and store all of the numbers at once and then
> just access that array as needed.
>
> But then again, if you are doing bootstrapping, it really should not
> matter if you do 'sufficient' resamples.

I won't rely on it later on, and I don't think it makes a difference,
but getting identical results across different implementations is very
useful for testing.

Josef

>
>
> Bruce
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list