[Numpy-discussion] Behavior of np.random.multivariate_normal with bad covariance matrices
Blake Griffith
blake.a.griffith at gmail.com
Tue Apr 7 13:03:53 EDT 2015
I like your idea Josef, I'll add it to the PR. Just to be clear, we should
have something like:
Have a single "check_valid" keyword arg, which will default to warn, since
that is the current behavior. It will check approximate symmetry, PSDness,
and for NaN & infs. Other options on the check_valid keyword arg will be
ignore, and raise.
What should happen when "fix" is passed for check_valid? Set negative
eigenvalues to 0 and symmetrize the matrix?
On Mon, Mar 30, 2015 at 8:34 AM, <josef.pktd at gmail.com> wrote:
> On Sun, Mar 29, 2015 at 7:39 PM, Blake Griffith
> <blake.a.griffith at gmail.com> wrote:
> > I have an open PR which lets users control the checks on the input
> > covariance matrix. The matrix is required to be symmetric and positve
> > semi-definite (PSD). The current behavior is that NumPy raises a warning
> if
> > the matrix is not PSD, and does not even check for symmetry.
> >
> > I added a symmetry check, which raises a warning when the input is not
> > symmetric. And added two keyword args which users can use to turn off the
> > checks/warnings when the matrix is ill formed. So this would only cause
> > another new warning to be raised in existing code.
> >
> > This is needed because sometimes the covariance matrix is only *almost*
> > symmetric or PSD due to roundoff error.
> >
> > Thoughts?
>
> My only question is why is **exact** symmetry relevant?
>
> AFAIU
> A empirical covariance matrix might not be exactly symmetric unless we
> specifically force it to be. But I don't see why some roundoff errors
> that violate symmetry should be relevant.
>
> use allclose with floating point rtol or equivalent?
>
> Some user code might suddenly get irrelevant warnings.
>
> BTW:
> neg = (np.sum(u.T * v, axis=1) < 0) & (s > 0)
> doesn't need to be calculated if cov_psd is false.
>
> -----
>
> some more:
>
> svd can hang if the values are not finite, i.e. nan or infs
>
> counter proposal would be to add a `check_valid` keyword with option
> ignore. warn, raise, and "fix"
>
> and raise an error if there are nans and check_valid is not ignore.
>
> ---------
>
> aside:
> np.random.multivariate_normal is only relevant if you have a new cov
> each call (or don't mind repeated possibly expensive calculations),
> so, I guess, adding checks by default won't upset many users.
>
>
> Josef
>
>
> >
> >
> > PR: https://github.com/numpy/numpy/pull/5726
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150407/08463204/attachment.html>
More information about the NumPy-Discussion
mailing list