[SciPy-User] linear algebra: quadratic forms without linalg.inv

Mon Nov 2 04:37:26 EST 2009

josef.pktd at gmail.com skrev:
> Good, I didn't realize this when I worked on the eig and svd versions of
> the pca. In a similar way, I was initially puzzled that pinv can be used
> on the data matrix or on the covariance matrix (only the latter I have seen
> in books).
>
>   

I'll try to explain... If you have a matrix C, you can factorize like 
this, with Sigma being a diagonal matrix:

   C = U * Sigma * V'

   >>> u,s,vt = np.linalg.svd(c)

If C is square (rank n x n), we now have the inverse

   C**-1 = V * [S**-1] * U'

   >>> c_inv = np.mat(vt.T) * np.mat(np.eye(4)/s) * np.mat(u.T)

And here you have the pathology diagnosis:

A small value of s, will cause a huge value of 1/s. This is 
"ill-conditioning" that e.g. happens with multicolinearity. You get a 
small s, you divide by it, and rounding error skyrockets. We can improve 
the situation by editing the tiny values in Sigma to zero. That just 
changes C by a tiny amount, but might have a dramatic stabilizing effect 
on C**-1. Now you can do your LU and not worry. It might not be clear 
from statistics textbooks why multicolinearity is problem. But using 
SVD, we see both the problem and the solution very clearly: A small 
singular value might not contribute significantly to C, but could or 
severly affect or even dominate in C**-1. We can thus get a biased but 
numerically better approximation to C**-1 by deleting it from the 
equation. So after editing s, we could e.g. do:

   >>> c_fixed = np.mat(u) * np.mat(np.eye(4)*s) * np.mat(vt)

and continue with LU on c_fixed to get the quadratic form.

Also beware that you can solve

   C * x = b

like this

   x = (V * [S**-1]) * (U' * b)

But if we are to reapeat this for several values of b, it would make 
more sence to reconstruct C and go for the LU. Soving with LU also 
involves two matrix multiplications:

   L * y = b
   U * x = y

but the computational demand is reduced by the triangular structure of L 
and U.

Please don't say you'd rather preprocess data with a PCA. If C was a 
covariance matrix, we just threw out the smallest principal components 
out of the data. Deleting tiny singular values is in fact why PCA helps!

Also beware that

   pca = lambda x : np.linalg.svd(x-x.mean(axis=0), full_matrices=0)

So we can get PCA from SVD without even calculating the covariance. Now 
you have the standard deviations in Sigma, the principal components in 
V, and the factor loadings in U. SVD is how PCA is usually computed. It 
is better than estimating Cov(X), and then apply Jacobi rotations to get 
the eigenvalues and eigenvectors of  Cov(X). One reason is that Cov(X) 
should be estimated using a "two-pass algorithm" to cancel accumulating 
rounding error (Am Stat, 37: p.  242-247). But that equation is not 
shown in most statistics textbooks, so most practitioners tend to not 
know of it . 
We can solve the common least squares problem using an SVD:

   b = argmin { || X * b - Y  ||  **  2 }

If we do an SVD of X, we can compute

   b = sum( ((u[i,:] * Y )/s[i]) * vt[:,i].T )

Unlike the other methods of fitting least squares, this one cannot fail. 
And you also see clearly what a PCA will do:

   Skip "(u[i,:] * Y )/s[i]" for too small values of s[i]

So you can preprocess with PCA anf fit LS in one shot. 

Ridge-regression (Tychonov regularization) is another solution to the 
multicollinearity problem:

    (A'A + lambda*I)*x = A'b

But how would you choose the numerically optimal value of lambda? It 
turns out to be a case of SVD as well. Goloub & van Loan has that on 
page 583.

QR with column pivoting can be seen as a case of SVD. Many use this for 
least-squares, not even knowing it is SVD. So SVD is ubiquitous in data 
modelling, even if you don't know it. :-)

One more thing: The Cholesky factorization is always stabile, the LU is 
not. But don't be fooled: This only applies to the facotization itself. 
If you have multicolinearity, the problem is there even if you use 
Cholesky. You get the "singular value disease" (astronomic rounding 
error) when you solve the triangular system. A Cholesky can tell you if 
a covariance matrix is singular at your numerical precision. An SVD can 
tell you how close to singularity it is, and how to fix it. SVD comes at 
a cost, which is  slower  computation. But usually it is worth the extra 
investment in CPU cycles.

Sturla Molden