[SciPy-User] linear algebra: quadratic forms without linalg.inv
Sturla Molden
sturla at molden.no
Mon Nov 2 04:37:26 EST 2009
josef.pktd at gmail.com skrev:
> Good, I didn't realize this when I worked on the eig and svd versions of
> the pca. In a similar way, I was initially puzzled that pinv can be used
> on the data matrix or on the covariance matrix (only the latter I have seen
> in books).
>
>
I'll try to explain... If you have a matrix C, you can factorize like
this, with Sigma being a diagonal matrix:
C = U * Sigma * V'
>>> u,s,vt = np.linalg.svd(c)
If C is square (rank n x n), we now have the inverse
C**-1 = V * [S**-1] * U'
>>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T)
And here you have the pathology diagnosis:
A small value of s, will cause a huge value of 1/s. This is
"ill-conditioning" that e.g. happens with multicolinearity. You get a
small s, you divide by it, and rounding error skyrockets. We can improve
the situation by editing the tiny values in Sigma to zero. That just
changes C by a tiny amount, but might have a dramatic stabilizing effect
on C**-1. Now you can do your LU and not worry. It might not be clear
from statistics textbooks why multicolinearity is problem. But using
SVD, we see both the problem and the solution very clearly: A small
singular value might not contribute significantly to C, but could or
severly affect or even dominate in C**-1. We can thus get a biased but
numerically better approximation to C**-1 by deleting it from the
equation. So after editing s, we could e.g. do:
>>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt)
and continue with LU on c_fixed to get the quadratic form.
Also beware that you can solve
C * x = b
like this
x = (V * [S**-1]) * (U' * b)
But if we are to reapeat this for several values of b, it would make
more sence to reconstruct C and go for the LU. Soving with LU also
involves two matrix multiplications:
L * y = b
U * x = y
but the computational demand is reduced by the triangular structure of L
and U.
Please don't say you'd rather preprocess data with a PCA. If C was a
covariance matrix, we just threw out the smallest principal components
out of the data. Deleting tiny singular values is in fact why PCA helps!
Also beware that
pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0)
So we can get PCA from SVD without even calculating the covariance. Now
you have the standard deviations in Sigma, the principal components in
V, and the factor loadings in U. SVD is how PCA is usually computed. It
is better than estimating Cov(X), and then apply Jacobi rotations to get
the eigenvalues and eigenvectors of Cov(X). One reason is that Cov(X)
should be estimated using a "two-pass algorithm" to cancel accumulating
rounding error (Am Stat, 37: p. 242-247). But that equation is not
shown in most statistics textbooks, so most practitioners tend to not
know of it .
We can solve the common least squares problem using an SVD:
b = argmin { || X * b - Y || ** 2 }
If we do an SVD of X, we can compute
b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T )
Unlike the other methods of fitting least squares, this one cannot fail.
And you also see clearly what a PCA will do:
Skip "(u[i,:] * Y )/s[i]" for too small values of s[i]
So you can preprocess with PCA anf fit LS in one shot.
Ridge-regression (Tychonov regularization) is another solution to the
multicollinearity problem:
(A'A + lambda*I)*x = A'b
But how would you choose the numerically optimal value of lambda? It
turns out to be a case of SVD as well. Goloub & van Loan has that on
page 583.
QR with column pivoting can be seen as a case of SVD. Many use this for
least-squares, not even knowing it is SVD. So SVD is ubiquitous in data
modelling, even if you don't know it. :-)
One more thing: The Cholesky factorization is always stabile, the LU is
not. But don't be fooled: This only applies to the facotization itself.
If you have multicolinearity, the problem is there even if you use
Cholesky. You get the "singular value disease" (astronomic rounding
error) when you solve the triangular system. A Cholesky can tell you if
a covariance matrix is singular at your numerical precision. An SVD can
tell you how close to singularity it is, and how to fix it. SVD comes at
a cost, which is slower computation. But usually it is worth the extra
investment in CPU cycles.
Sturla Molden
More information about the SciPy-User
mailing list