[SciPy-User] Cholesky for semi-definite matrices?

Tue Jul 12 10:23:17 EDT 2011

On Sat, Jul 09, 2011 at 08:36:01PM +0300, eat wrote:
> > (It's a
> > covariance matrix computed from not-enough samples, so it's positive
> > semi-definite but rank-deficient.) 

> How bout a slight regularization,

Indeed, from a statistics point of view, and non definite positive
covariance matrix is simply an estimation error. As a descriptive
statistic, it may look good, but it does not contain much information on
the population covariance. The easiest solution is to regularize it.

The good news is that there exists good heuristics (approximate oracles)
to find the optimal regularization parameter). If you assume that your
underlying data is Gaussian, the 'Approximate Oracle Shrinkage' is
optimal. If you don't want this assumption, the Ledoit-Wolf shrinkage
works great. In practice they are often similar, but LW tends to
under-penalize compared to AOS if you have little data compare to the
dimension of your dataset.

Another good news is that you can find Python implementations of these
heuristics in the scikit-learn:
https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/covariance/shrunk_covariance_.py
you will find references to the paper if you are interested, and you can
simply copy out the function from the scikit-learn if you don't want to
depend on it.

HTH,

Gael