Hi all,
How are you using the values? How significant are the differences?
i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace.
I've dealt with similar issues a lot -- performing PCA on data where the dimensionality of the data is much greater than the number of data points. (Like images.) In this case, the maximum number of non-trivial eigenvectors of the covariance matrix of the data is min(dimension_of_data, number_of_data_points), so one always runs into the zero-eigenvalue problem; the matrix is thus always ill-conditioned, but that's not a problem in these cases. Nevertheless, if you've got (say) 100 images that are each 100x100 pixels, to do PCA in the naive way you need to make a 10000x10000 covariance matrix and then decompose it into 100000 eigenvectors and values just to get out the 100 non-trivial ones. That's a lot of computation wasted calculating noise! Fortunately, there are better ways. One is to perform the SVD on the 100x10000 data matrix. Let the centered (mean-subtracted) data matrix be D, then the SVD provides matrices U, S, and V'. IIRC, the eigenvalues of D'D (the covariance matrix of interest) are then packed along the first dimension of V', and the eigenvalues are the square of the values in S. But! There's an even faster way (from my testing). The trick is that instead of calculating the 10000x10000 outer covariance matrix D'D, or doing the SVD on D, one can calculate the 100x100 "inner" covariance matrix DD' and perform the eigen-decomposition thereon and then trivially transform those eigenvalues and vectors to the ones of the D'D matrix. This computation is often substantially faster than the SVD. Here's how it works: Let D, our re-centered data matrix, be of shape (n, k) -- that is, n data points in k dimensions. We know that D has a singular value decomposition D = USV' (no need to calculate the SVD though; just enough to know it exists). From this, we can rewrite the covariance matrices: D'D = VS'SV' DD' = USS'U' Now, from the SVD, we know that S'S and SS' are diagonal matrices, and V and U (and V' and U') form orthogonal bases. One way to write the eigen-decomposition of a matrix is A = QLQ', where Q is orthogonal and L is diagonal. Since the eigen-decomposition is unique (up to a permutation of the columns of Q and L), we know that V must therefore contain the eigenvectors of D'D in its columns, and U must contain the eigenvectors of DD' in its columns. This is the origin of the SVD recipe for that I gave above. Further, let S_hat, of shape (n, k) be the elementwise reciprocal of S (i.e. SS_hat = I of shape (m, n) and S_hatS = I of shape (n, m), where I is the identity matrix). Then, we can solve for U or V in terms of the other: V = D'US_hat' U = DVS_hat So, to get the eigenvectors and eigenvalues of D'D, we just calculate DD' and then apply the symmetric eigen-decomposition (symmetric version is faster, and DD' is symmetric) to get eigenvectors U and eigenvalues L. We know that L=SS', so S_hat = 1/sqrt(L) (where the sqrt is taken elementwise, of course). So, the eigenvectors we're looking for are: V = D'US_hat Then, the principal components (eigenvectors) are in the columns of V (packed along the second dimension of V). Fortunately, I've packaged this all up into a python module for PCA that takes care of this all. It's attached. Zach Pincus Postdoctoral Fellow, Lab of Dr. Frank Slack Molecular, Cellular and Developmental Biology Yale University