Bias in numpy.random.multivariate_normal?
Could someone confirm I've got this right? In a multivariate normal (MVN) the covariance is E(X.X') - E(X).E(X'). I sample many times from a MVN with given mean and covariance. I then calculate the covariance of the sample. This covariance matrix should show no bias to be larger or smaller than the distribution's covariance but I see that some entries are always larger and others are smaller. See this code: from numpy.random import multivariate_normal from numpy.linalg import inv from numpy import outer mu = [10.,10.] precision = [[.6, .3], [.4, .3]] sigma = inv(precision) sample_size = 10000 num_tests = 100 for test in xrange(num_tests): samples = multivariate_normal(mu, sigma, [sample_size]) sample_mean = samples.sum(axis=0)/sample_size #print (sample_mean - mu) > 0. sample_covariance = sum(outer(sample-sample_mean, sample-sample_mean) for sample in samples) / sample_size print (sample_covariance - sigma) > 0 'True' is printed when the entry is larger and False otherwise Did I get this wrong or is this acceptable behaviour for a random variate generator or is it a bug? Thanks, John.
IIRC, you computed the biaised variance, so it is normal to see biais. Try dividing by (sample_size-1), it should be the unbiaised estimator. Matthieu 2007/11/10, John Reid <j.reid@mail.cryst.bbk.ac.uk>:
Could someone confirm I've got this right?
In a multivariate normal (MVN) the covariance is E(X.X') - E(X).E(X'). I sample many times from a MVN with given mean and covariance. I then calculate the covariance of the sample. This covariance matrix should show no bias to be larger or smaller than the distribution's covariance but I see that some entries are always larger and others are smaller.
See this code:
from numpy.random import multivariate_normal from numpy.linalg import inv from numpy import outer
mu = [10.,10.] precision = [[.6, .3], [.4, .3]] sigma = inv(precision) sample_size = 10000 num_tests = 100
for test in xrange(num_tests): samples = multivariate_normal(mu, sigma, [sample_size]) sample_mean = samples.sum(axis=0)/sample_size #print (sample_mean - mu) > 0. sample_covariance = sum(outer(sample-sample_mean, sample-sample_mean) for sample in samples) / sample_size print (sample_covariance - sigma) > 0
'True' is printed when the entry is larger and False otherwise
Did I get this wrong or is this acceptable behaviour for a random variate generator or is it a bug?
Thanks, John.
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- French PhD student Website : http://miles.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher wrote:
IIRC, you computed the biaised variance, so it is normal to see biais. Try dividing by (sample_size-1), it should be the unbiaised estimator.
That wouldn't change the sign of the answer. In any case I get the same results when dividing by sample_size-1.
The covariance matrix provided in sigma is not symmetric. Forcing symmetry by changing the definition of sigma to sigma = inv(precision) sigma[1,0] = sigma[0,1] yields behavior that I suspect is in line with your expectations. Perhaps a check for symmetry in the cov argument would be desirable addition to multivariate_normal? Bryan On Nov 10, 2007 12:55 PM, John Reid <j.reid@mail.cryst.bbk.ac.uk> wrote:
In a multivariate normal (MVN) the covariance is E(X.X') - E(X).E(X'). I sample many times from a MVN with given mean and covariance. I then calculate the covariance of the sample. This covariance matrix should show no bias to be larger or smaller than the distribution's covariance but I see that some entries are always larger and others are smaller.
Bryan Nollett wrote:
The covariance matrix provided in sigma is not symmetric.
Good point. I needed some test covariance matrices and I completely forgot the symmetric requirement. There is no way to draw from the Wishart or Inv-Wishart distributions is there? Their pdfs would be handy too. Thanks, John.
Did you try the rvs method of those classes ? Matthieu 2007/11/11, John Reid <j.reid@mail.cryst.bbk.ac.uk>:
Bryan Nollett wrote:
The covariance matrix provided in sigma is not symmetric.
Good point. I needed some test covariance matrices and I completely forgot the symmetric requirement. There is no way to draw from the Wishart or Inv-Wishart distributions is there? Their pdfs would be handy too.
Thanks, John.
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- French PhD student Website : http://miles.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
John Reid wrote:
Matthieu Brucher wrote:
Did you try the rvs method of those classes ?
Of which classes? I couldn't see anything in scipy.stats or scipy.stats.distributions to do with the wishart distribution.
I don't think there is. This would be handy, indeed. So handy that I have just created a ticket for it :) http://scipy.org/scipy/scipy/ticket/536 cheers, David
participants (4)
-
Bryan Nollett
-
David Cournapeau
-
John Reid
-
Matthieu Brucher