On 06/29/2010 11:38 PM, David Goldsmith wrote:
On Tue, Jun 29, 2010 at 8:16 PM, Bruce Southey <bsouthey@gmail.com <mailto:bsouthey@gmail.com>> wrote:
On Tue, Jun 29, 2010 at 6:03 PM, David Goldsmith <d.l.goldsmith@gmail.com <mailto:d.l.goldsmith@gmail.com>> wrote: > On Tue, Jun 29, 2010 at 3:56 PM, <josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>> wrote: >> >> On Tue, Jun 29, 2010 at 6:37 PM, David Goldsmith >> <d.l.goldsmith@gmail.com <mailto:d.l.goldsmith@gmail.com>> wrote: >> > ...concerns the behavior of numpy.random.multivariate_normal; if that's >> > of >> > interest to you, I urge you to take a look at the comments (esp. mine >> > :-) ); >> > otherwise, please ignore the noise. Thanks! >> >> You should add the link to the ticket, so it's faster for everyone to >> check what you are talking about. >> >> Josef > > Ooops! Yes I should; here it is: > > http://projects.scipy.org/numpy/ticket/1223 > Sorry, and thanks, Josef. > > DG > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > As I recall, there is no requirement for the variance/covariance of the normal distribution to be positive definite.
No, not positive definite, positive *semi*-definite: yes, the variance may be zero (the cov may have zero-valued eigenvalues), but the claim (and I actually am "neutral" about it, in that I wanted to reference the claim in the docstring and was told that doing so was unnecessary, the implication being that this is a "well-known" fact), is that, in essence (in 1-D) the variance can't be negative, which seems clear enough. I don't see you disputing that, and so I'm uncertain as to how you feel about the proposal to "weakly" enforce symmetry and positive *semi*-definiteness. (Now, if you dispute that even requiring positive *semi*-definiteness is desirable, you'll have to debate that w/ some of the others, because I'm taking their word for it that indefiniteness is "unphysical.")
DG
From http://en.wikipedia.org/wiki/Multivariate_normal_distribution "The covariance matrix is allowed to be singular (in which case the corresponding distribution has no density)."
So you must be able to draw random numbers from such a distribution. Obviously what those numbers really mean is another matter (I presume the dependent variables should be a linear function of the independent variables) but the user *must* know since they entered it. Since the function works the docstring Notes comment must be wrong.
Imposing any restriction means that this is no longer a multivariate normal random number generator. If anything, you can only raise a warning about possible non-positive definiteness but even that will vary depending how it is measured and on the precision being used.
Bruce _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Mathematician: noun, someone who disavows certainty when their uncertainty set is non-empty, even if that set has measure zero.
Hope: noun, that delusive spirit which escaped Pandora's jar and, with her lies, prevents mankind from committing a general suicide. (As interpreted by Robert Graves)
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
As you (and the theory) say, a variance should not be negative - yeah right :-) In practice that is not exactly true because estimation procedures like equating observed with expected sum of squares do lead to negative estimates. However, that is really a failure of the model, data and algorithm. I think the issue is really how numpy should handle input when that input is theoretically invalid. I (and apparent the bug submitter) do not know what to expect if the input is not positive definite. If the svd approach is correct for such cases and numpy 'trusts' the user, as the usual case, then there is no issue. If the svd approach is incorrect for such cases then that is obviously a bug. If numpy can not trust the user then numpy has to check and either raise a warning or error if the input variances are greater than or equal to zero and that the cov argument is symmetric. Replacing the SVD with cholesky would also address these issues as both of these are checked by numpy's cholesky function. However, cholesky() does not support semi-positive covariance/variance input (which is possible http://en.wikipedia.org/wiki/Cholesky_decomposition#Proof_for_positive_semi-...). Also as Robert said in the thread that 'Cholesky decomposition gave an error "too soon" in my estimation'. Bruce