Hi List,
I'm new to Numpy and I'm a little confused about the behavior of
numpy.random.multivariate_normal(). I'm not sure I'm passing the
variances correctly. My goal is to sample from a bivariate normal, but
the kooky behavior shows up when I sample from a univariate distribution.
In short, the multivariate normal function doesn't seem to give me
values in the appropriate ranges. Here's an example of what I mean.
In[1]: from numpy.random import normal, multivariate_normal
In [2]: normal(100, 100, 10)
Out[2]:
array([ 258.62344586, 70.16378815, 49.46826997, 49.58567724,
182.68652256, 226.67292034, 92.03801549, 18.2686146 ,
94.09104313, 80.35697507])
The samples look about right to me. But then when I try to do the same
using the multivariate_normal, the values it draws look too close to the
mean.
In [3]: multivariate_normal([100], [[100]], 10)
Out[3]:
array([[ 109.10083984],
[ 97.43526716],
[ 108.43923772],
[ 97.87345947],
[ 103.405562 ],
[ 110.2963736 ],
[ 103.96445585],
[ 90.58168544],
[ 91.20549222],
[ 104.4051761 ]])
These values all fall within 10 units of the mean.
In [4]: multivariate_normal([100], [[1000]], 10)
Out[4]:
array([[ 62.04304611],
[ 123.29364557],
[ 83.53840083],
[ 64.67679778],
[ 127.82433157],
[ 11.3305656 ],
[ 95.4101701 ],
[ 126.53213908],
[ 104.68868736],
[ 32.45886112]])
In [5]: normal(100, 1000, 10)
Out[5]:
array([ 1052.93998938, -1254.12576419, 258.75390045, 688.32715327,
-2.36543806, -1570.54842269, 472.90045029, 394.62908014,
137.10320437, 1741.85017871])
And just to exaggerate things a little more:
In [6]: multivariate_normal([100], [[10000]], 10).T][0]
Out[6]:
array([ 274.45446694, 85.79359682, 245.03248721, 120.10912405,
-34.76526896, 134.93446664, 47.6768889 , 93.34140984,
80.27244669, 229.64700591])
Whereas
In [7]: normal(100, 10000, 10)
Out[7]:
array([ -554.68666687, 3724.59638363, -14873.55303901, -3111.22162495,
-10813.66412901, 4688.81310356, -9510.92470735, -12689.02667559,
-10379.01381925, -4534.60480031])
I'm shocked that I don't get some negative values in Out[4]. And Out[6]
really ought to have some numbers in the thousands.
I'd totally be willing to believe that I don't understand the
multivariate normal and/or variance. Can someone tell me whether these
numbers look sane?
For the bivariate case I do something like this:
means = [100, 100]
variances = [100, 1000]
Sx, Sy = variances
sx, sy = map(sqrt, variances)
cor = 0.7
cov = [[Sx, cor*sx*sy], [cor*sy*sx, Sy]]
draws = 10
samples = multivariate_normal(means, cov, draws)
As mentioned before, the samples are shockingly close to their means.
Thanks,
Josh