[Numpy-discussion] Sampling from the multivariate normal

Wed Nov 9 12:39:01 EST 2011

Hi Oliver, Robert,

Thanks to both of you for your speedy and enlightening responses. They 
really cleared things up.

Now my simulations seem to work as intended.

All the best,
Josh

On 11/9/11 12:14 PM, Robert Kern wrote:
> On Wed, Nov 9, 2011 at 16:40, Joshua Anthony Reyes
> <joshua.reyes at gmail.com>  wrote:
>> Hi List,
>>
>> I'm new to Numpy and I'm a little confused about the behavior of
>> numpy.random.multivariate_normal(). I'm not sure I'm passing the
>> variances correctly. My goal is to sample from a bivariate normal, but
>> the kooky behavior shows up when I sample from a univariate distribution.
>>
>> In short, the multivariate normal function doesn't seem to give me
>> values in the appropriate ranges. Here's an example of what I mean.
>>
>> In[1]: from numpy.random import normal, multivariate_normal
>>
>> In [2]: normal(100, 100, 10)
>> Out[2]:
>> array([ 258.62344586,   70.16378815,   49.46826997,   49.58567724,
>>          182.68652256,  226.67292034,   92.03801549,   18.2686146 ,
>>           94.09104313,   80.35697507])
> Here, you are asking for a Gaussian distribution with a mean of 100
> and a stdev of 100 (or equivalently, a variance of 10000).
>
>> The samples look about right to me. But then when I try to do the same
>> using the multivariate_normal, the values it draws look too close to the
>> mean.
>>
>> In [3]: multivariate_normal([100], [[100]], 10)
> Here you are asking for a Gaussian with a mean of 100 and a variance
> of 100 (or equivalently, a stdev of 10). Note the differences between
> the two signatures:
>
>    np.random.normal(mean, stdev)
>    np.random.multivariate_normal(mean, covariance)
>
>> Out[3]:
>> array([[ 109.10083984],
>>         [  97.43526716],
>>         [ 108.43923772],
>>         [ 97.87345947],
>>         [ 103.405562  ],
>>         [ 110.2963736 ],
>>         [ 103.96445585],
>>         [ 90.58168544],
>>         [  91.20549222],
>>         [ 104.4051761 ]])
>>
>> These values all fall within 10 units of the mean.
>>
>> In [4]: multivariate_normal([100], [[1000]], 10)
>> Out[4]:
>> array([[  62.04304611],
>>         [ 123.29364557],
>>         [ 83.53840083],
>>         [ 64.67679778],
>>         [ 127.82433157],
>>         [  11.3305656 ],
>>         [  95.4101701 ],
>>         [ 126.53213908],
>>         [ 104.68868736],
>>         [  32.45886112]])
>>
>> In [5]: normal(100, 1000, 10)
>> Out[5]:
>> array([ 1052.93998938, -1254.12576419,   258.75390045,   688.32715327,
>>            -2.36543806, -1570.54842269,   472.90045029,   394.62908014,
>>           137.10320437,  1741.85017871])
>>
>> And just to exaggerate things a little more:
>>
>> In [6]: multivariate_normal([100], [[10000]], 10).T][0]
>> Out[6]:
>> array([ 274.45446694,   85.79359682,  245.03248721,  120.10912405,
>>          -34.76526896,  134.93446664,   47.6768889 ,   93.34140984,
>>           80.27244669,  229.64700591])
>>
>> Whereas
>> In [7]: normal(100, 10000, 10)
>> Out[7]:
>> array([  -554.68666687,   3724.59638363, -14873.55303901,  -3111.22162495,
>>         -10813.66412901,   4688.81310356,  -9510.92470735, -12689.02667559,
>>         -10379.01381925,  -4534.60480031])
>>
>> I'm shocked that I don't get some negative values in Out[4]. And Out[6]
>> really ought to have some numbers in the thousands.
>>
>> I'd totally be willing to believe that I don't understand the
>> multivariate normal and/or variance. Can someone tell me whether these
>> numbers look sane?
>>
>> For the bivariate case I do something like this:
>>
>> means = [100, 100]
>> variances = [100, 1000]
>> Sx, Sy = variances
>> sx, sy = map(sqrt, variances)
>> cor = 0.7
>> cov = [[Sx, cor*sx*sy], [cor*sy*sx, Sy]]
>> draws = 10
>> samples = multivariate_normal(means, cov, draws)
>>
>> As mentioned before, the samples are shockingly close to their means.
> They look right to me.
>
> [~]
> |19>  means = [100, 100]
>
> [~]
> |20>  variances = [100, 1000]
>
> [~]
> |21>  Sx, Sy = variances
>
> [~]
> |22>  sx, sy = map(sqrt, variances)
>
> [~]
> |23>  cor = 0.7
>
> [~]
> |24>  cov = np.array([[Sx, cor*sx*sy], [cor*sy*sx, Sy]])
>
> [~]
> |26>  samples = np.random.multivariate_normal(means, cov, 10000)
>
> [~]
> |27>  cov
> array([[  100.        ,   221.35943621],
>         [  221.35943621,  1000.        ]])
>
> [~]
> |28>  np.cov(samples.T)
> array([[  101.16844481,   222.00301056],
>         [  222.00301056,  1001.58403922]])
>