hi i was calculating eigenvalues and eigenvectors for a covariancematrix using numpy
adjfaces=matrix(adjarr) faces_trans=adjfaces.transpose() covarmat=adjfaces*faces_trans evalues,evect=eigh(covarmat)
for a sample covarmat like [[ 1.69365981e+13 , 5.44960784e+12, 9.00346400e+12 , 2.48352625e +12] [ 5.44960784e+12, 5.08860660e+12, 8.67539205e+11 , 1.22854045e +12] [ 9.00346400e+12, 8.67539205e+11, 1.78184943e+13 ,7.94749110e +12] [ 2.48352625e+12 , 1.22854045e+12, 7.94749110e+12 , 9.20247690e +12]]
i get these evalues [ 3.84433376e03, 4.17099934e+12 , 1.71771364e+13 , 2.76980401e+13]
evect [[ 0.5 0.04330262 0.60041892 0.62259297] [ 0.5 0.78034307 0.35933516 0.10928372] [ 0.5 0.25371931 0.3700265 0.74074753] [ 0.5 0.56992638 0.61111026 0.22743827]]
what bothers me is that for the same covarmat i get a different set of eigenvectors and eigenvalues when i use java library Jama's methods Matrix faceM = new Matrix(faces, nrfaces,length); Matrix faceM_transpose = faceM.transpose(); Matrix covarM = faceM.times(faceM_transpose); EigenvalueDecomposition E = covarM.eig(); double[] eigValue = diag(E.getD().getArray()); double[][] eigVector = E.getV().getArray();
here the eigValue= [6.835301757686207E4, 4.170999335736721E12, 1.7177136443134865E13, 2.7698040117669414E13]
and eigVector [ [0.5, 0.04330262221379265, 0.6004189175979487, 0.6225929700052174], [0.5, 0.7803430730840767, 0.3593351608695496, 0.10928371540423852], [0.49999999999999994, 0.2537193127299541, 0.370026504572483, 0.7407475253159538], [0.49999999999999994, 0.5699263825679145, 0.6111102613008821, 0.22743827071497524] ]
I am quite confused bythis difference in results ..the first element in eigValue is different and also the signs in last column of eigVectors are diff..can someone tell me why this happens? thanks dn
Hi,
The results are OK, they are very close. Your matrix is almost singular, is badly conditionned, ... But the results are very close is you check them in a relative way. 3.84433376e03 or 6.835301757686207E4 is the same compared to 2.76980401e+13
Matthieu
2008/2/20, devnew@gmail.com devnew@gmail.com:
hi i was calculating eigenvalues and eigenvectors for a covariancematrix using numpy
adjfaces=matrix(adjarr) faces_trans=adjfaces.transpose() covarmat=adjfaces*faces_trans evalues,evect=eigh(covarmat)
for a sample covarmat like [[ 1.69365981e+13 , 5.44960784e+12, 9.00346400e+12 , 2.48352625e +12] [ 5.44960784e+12, 5.08860660e+12, 8.67539205e+11 , 1.22854045e +12] [ 9.00346400e+12, 8.67539205e+11, 1.78184943e+13 ,7.94749110e +12] [ 2.48352625e+12 , 1.22854045e+12, 7.94749110e+12 , 9.20247690e +12]]
i get these evalues [ 3.84433376e03, 4.17099934e+12 , 1.71771364e+13 , 2.76980401e+13]
evect [[ 0.5 0.04330262 0.60041892 0.62259297] [ 0.5 0.78034307 0.35933516 0.10928372] [ 0.5 0.25371931 0.3700265 0.74074753] [ 0.5 0.56992638 0.61111026 0.22743827]]
what bothers me is that for the same covarmat i get a different set of eigenvectors and eigenvalues when i use java library Jama's methods Matrix faceM = new Matrix(faces, nrfaces,length); Matrix faceM_transpose = faceM.transpose(); Matrix covarM = faceM.times(faceM_transpose); EigenvalueDecomposition E = covarM.eig(); double[] eigValue = diag(E.getD().getArray()); double[][] eigVector = E.getV().getArray();
here the eigValue= [6.835301757686207E4, 4.170999335736721E12, 1.7177136443134865E13, 2.7698040117669414E13]
and eigVector [ [0.5, 0.04330262221379265, 0.6004189175979487, 0.6225929700052174], [0.5, 0.7803430730840767, 0.3593351608695496, 0.10928371540423852], [0.49999999999999994, 0.2537193127299541, 0.370026504572483, 0.7407475253159538], [0.49999999999999994, 0.5699263825679145, 0.6111102613008821, 0.22743827071497524] ]
I am quite confused bythis difference in results ..the first element in eigValue is different and also the signs in last column of eigVectors are diff..can someone tell me why this happens? thanks dn
Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Yes.
Your first eigenvalue is effectively 0, the values you see are just noise. Different implementations produce different noise.
As for the signs ot the eigenvector components, which direction is + or  X is arbitrary. Different implementations follow different conventions as to which is which. Sometimes it's just an accident.
Nothingtoseeheremovealongly, w
On Wed, 20 Feb 2008, Matthieu Brucher wrote:
Hi,
The results are OK, they are very close. Your matrix is almost singular, is badly conditionned, ... But the results are very close is you check them in a relative way. 3.84433376e03 or 6.835301757686207E4 is the same compared to 2.76980401e+13
Matthieu
2008/2/20, devnew@gmail.com devnew@gmail.com:
hi i was calculating eigenvalues and eigenvectors for a covariancematrix using numpy
adjfaces=matrix(adjarr) faces_trans=adjfaces.transpose() covarmat=adjfaces*faces_trans evalues,evect=eigh(covarmat)
for a sample covarmat like [[ 1.69365981e+13 , 5.44960784e+12, 9.00346400e+12 , 2.48352625e +12] [ 5.44960784e+12, 5.08860660e+12, 8.67539205e+11 , 1.22854045e +12] [ 9.00346400e+12, 8.67539205e+11, 1.78184943e+13 ,7.94749110e +12] [ 2.48352625e+12 , 1.22854045e+12, 7.94749110e+12 , 9.20247690e +12]]
i get these evalues [ 3.84433376e03, 4.17099934e+12 , 1.71771364e+13 , 2.76980401e+13]
evect [[ 0.5 0.04330262 0.60041892 0.62259297] [ 0.5 0.78034307 0.35933516 0.10928372] [ 0.5 0.25371931 0.3700265 0.74074753] [ 0.5 0.56992638 0.61111026 0.22743827]]
what bothers me is that for the same covarmat i get a different set of eigenvectors and eigenvalues when i use java library Jama's methods Matrix faceM = new Matrix(faces, nrfaces,length); Matrix faceM_transpose = faceM.transpose(); Matrix covarM = faceM.times(faceM_transpose); EigenvalueDecomposition E = covarM.eig(); double[] eigValue = diag(E.getD().getArray()); double[][] eigVector = E.getV().getArray();
here the eigValue= [6.835301757686207E4, 4.170999335736721E12, 1.7177136443134865E13, 2.7698040117669414E13]
and eigVector [ [0.5, 0.04330262221379265, 0.6004189175979487, 0.6225929700052174], [0.5, 0.7803430730840767, 0.3593351608695496, 0.10928371540423852], [0.49999999999999994, 0.2537193127299541, 0.370026504572483, 0.7407475253159538], [0.49999999999999994, 0.5699263825679145, 0.6111102613008821, 0.22743827071497524] ]
I am quite confused bythis difference in results ..the first element in eigValue is different and also the signs in last column of eigVectors are diff..can someone tell me why this happens? thanks dn
Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 French PhD student Website : http://matthieubrucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
Different implementations follow different conventions as to which is which.
thank you for the replies ..the reason why i asked was that the most significant eigenvectors ( sorted according to eigenvalues) are later used in calculations and then the results obtained differ in java and python..so i was worried as to which one to use
Your matrix is almost singular, is badly conditionned,
Mathew, can you explain that..i didn't quite get it.. dn
Your matrix is almost singular, is badly conditionned,
Mathew, can you explain that..i didn't quite get it.. dn
The condition number is the ratio between the biggest eigenvalue and the lowest one. In your case, it is 10E16, so the precision of the double numbers. That means that some computations that depend on this ratio (like inversions) can lead to numerical errors. In your case, it is OK, but you should keep in mind this kind of trouble (read what you can on numerical computations ;))
Matthieu
The vectors that you used to build your covariance matrix all lay in or close to a 3dimensional subspace of the 4dimensional space in which they were represented. So one of the eigenvalues of the covariance matrix is 0, or close to it; the matrix is singular. Condition is the ratio of the largest eigenvalue to the smallest, large values can be troublesome. Here it is ~1e17, which is the dynamic range of doubles. Which means that the value you observe for the smallest eigenvaulue is just the result of rounding errors.
w
On Wed, 20 Feb 2008, devnew@gmail.com wrote:
Different implementations follow different conventions as to which is which.
thank you for the replies ..the reason why i asked was that the most significant eigenvectors ( sorted according to eigenvalues) are later used in calculations and then the results obtained differ in java and python..so i was worried as to which one to use
Your matrix is almost singular, is badly conditionned,
Mathew, can you explain that..i didn't quite get it.. dn
Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
On Feb 20, 2008 1:00 AM, devnew@gmail.com devnew@gmail.com wrote:
Different implementations follow different conventions as to which is which.
thank you for the replies ..the reason why i asked was that the most significant eigenvectors ( sorted according to eigenvalues) are later used in calculations and then the results obtained differ in java and python..so i was worried as to which one to use
How are you using the values? How significant are the differences?
Chuck
How are you using the values? How significant are the differences?
i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace. like #pseudocode... sortedeigenvectors=mysort(eigenvectors) facespace=sortedeigenvectors*adjfaces /* adjfaces is a matrix */
if i do this in python i get a facespace [[1028755.44341439, 1480864.32750018, 1917712.0162213, 983526.60328021, 1662357.13091294, 499792.41540038, 208696.97376238, 916628.92613255, 1454071.95225114, 1563209.39113008, 231969.96968212 , 768417.98606125] [ 866174.88336972, 1212934.33524067, 543013.86361006, 1352625.86282073, 309872.30710619 , 466301.12884198, 216088.93319292 ,1512378.8688779, 2581349.03171275, 1797812.01270033, 1876754.7339826 , 751781.8166291 ] [ 57026.32567001 , 69918.94570563, 399715.51441018, 233720.8360051, 188227.41229887, 177341.47889165 , 65241.23138424 , 311917.28253664, 1133399.70627111, 1089028.99019462, 684854.41725944 , 413465.86494352] [ 405955.15245412, 562832.78296479 , 864334.63457882 , 629752.80210603, 894768.52572026, 578460.80766675 , 629146.32442893 , 768037.57754708, 485157.28573271, 1718776.11176486 , 780929.18155991 , 165391.19551137]]
whereas the same steps in java [ [516653.73649950844, 274000.54127598763, 108557.2732037272, 799041.4108906921, 495577.7478765989, 49125.38109725664, 162041.57505147497, 917033.3002665655, 1207264.8912226136, 1384551.3481945703, 1056098.9289163304, 357801.9553511339], [956064.0724430305, 1424775.0801567277, 898684.8188346579, 1385008.5401600213, 514677.038573372, 387195.56502804917, 281164.65362325957, 1512307.8891047493, 2114204.697920214, 1280391.7056360755, 1650660.0594245053, 554096.482085637], [666313.7580419029, 1365981.2428742633, 2011095.455319733, 453217.29083790665, 1199981.2283586136, 358852.32104592584, 375855.4012532809, 311436.16701894277, 2033000.776565753, 2418152.391663846, 847661.841421182, 926486.0374297247], [593030.0669844414, 121955.63569302124, 124121.99904933537, 697146.7418886195, 1321002.514808584, 743093.1371151333, 493712.52017493406, 767889.8563902564, 487050.6874229272, 641935.1621667973, 310387.14691965195, 246026.0999929544] ]
such difference causes diff results in all calculations involving the facespace
dn
You should have such differences, that's strange. Are you sure you're using the correct eigenvectors ?
Matthieu
2008/2/20, devnew@gmail.com devnew@gmail.com:
How are you using the values? How significant are the differences?
i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace. like #pseudocode... sortedeigenvectors=mysort(eigenvectors) facespace=sortedeigenvectors*adjfaces /* adjfaces is a matrix */
if i do this in python i get a facespace [[1028755.44341439, 1480864.32750018, 1917712.0162213, 983526.60328021, 1662357.13091294, 499792.41540038, 208696.97376238, 916628.92613255, 1454071.95225114, 1563209.39113008, 231969.96968212 , 768417.98606125] [ 866174.88336972, 1212934.33524067, 543013.86361006, 1352625.86282073, 309872.30710619 , 466301.12884198, 216088.93319292 ,1512378.8688779, 2581349.03171275, 1797812.01270033, 1876754.7339826 , 751781.8166291 ] [ 57026.32567001 , 69918.94570563, 399715.51441018, 233720.8360051, 188227.41229887, 177341.47889165 , 65241.23138424 , 311917.28253664, 1133399.70627111, 1089028.99019462, 684854.41725944 , 413465.86494352] [ 405955.15245412, 562832.78296479 , 864334.63457882 , 629752.80210603, 894768.52572026, 578460.80766675 , 629146.32442893 , 768037.57754708, 485157.28573271, 1718776.11176486 , 780929.18155991 , 165391.19551137]]
whereas the same steps in java [ [516653.73649950844, 274000.54127598763, 108557.2732037272, 799041.4108906921, 495577.7478765989, 49125.38109725664, 162041.57505147497, 917033.3002665655, 1207264.8912226136, 1384551.3481945703, 1056098.9289163304, 357801.9553511339], [956064.0724430305, 1424775.0801567277, 898684.8188346579, 1385008.5401600213, 514677.038573372, 387195.56502804917, 281164.65362325957, 1512307.8891047493, 2114204.697920214, 1280391.7056360755, 1650660.0594245053, 554096.482085637], [666313.7580419029, 1365981.2428742633, 2011095.455319733, 453217.29083790665, 1199981.2283586136, 358852.32104592584, 375855.4012532809, 311436.16701894277, 2033000.776565753, 2418152.391663846, 847661.841421182, 926486.0374297247], [593030.0669844414, 121955.63569302124, 124121.99904933537, 697146.7418886195, 1321002.514808584, 743093.1371151333, 493712.52017493406, 767889.8563902564, 487050.6874229272, 641935.1621667973, 310387.14691965195, 246026.0999929544] ]
such difference causes diff results in all calculations involving the facespace
dn _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Hi,
I would also like to know what Java package you're using. I find Weka PCA differs from Matlab (whereas previous experiments with Scipy PCA didn't show significant differences from Matlab), but I'm still looking into the cause.
Thanks, and greetings,
Javier Torres
Original Message From: numpydiscussionbounces@scipy.org [mailto:numpydiscussionbounces@scipy.org] On Behalf Of devnew@gmail.com Sent: miércoles, 20 de febrero de 2008 14:45 To: numpydiscussion@scipy.org Subject: Re: [Numpydiscussion] finding eigenvectors etc
How are you using the values? How significant are the differences?
i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace. like #pseudocode... sortedeigenvectors=mysort(eigenvectors) facespace=sortedeigenvectors*adjfaces /* adjfaces is a matrix */
if i do this in python i get a facespace [[1028755.44341439, 1480864.32750018, 1917712.0162213, 983526.60328021, 1662357.13091294, 499792.41540038, 208696.97376238, 916628.92613255, 1454071.95225114, 1563209.39113008, 231969.96968212 , 768417.98606125] [ 866174.88336972, 1212934.33524067, 543013.86361006, 1352625.86282073, 309872.30710619 , 466301.12884198, 216088.93319292 ,1512378.8688779, 2581349.03171275, 1797812.01270033, 1876754.7339826 , 751781.8166291 ] [ 57026.32567001 , 69918.94570563, 399715.51441018, 233720.8360051, 188227.41229887, 177341.47889165 , 65241.23138424 , 311917.28253664, 1133399.70627111, 1089028.99019462, 684854.41725944 , 413465.86494352] [ 405955.15245412, 562832.78296479 , 864334.63457882 , 629752.80210603, 894768.52572026, 578460.80766675 , 629146.32442893 , 768037.57754708, 485157.28573271, 1718776.11176486 , 780929.18155991 , 165391.19551137]]
whereas the same steps in java [ [516653.73649950844, 274000.54127598763, 108557.2732037272, 799041.4108906921, 495577.7478765989, 49125.38109725664, 162041.57505147497, 917033.3002665655, 1207264.8912226136, 1384551.3481945703, 1056098.9289163304, 357801.9553511339], [956064.0724430305, 1424775.0801567277, 898684.8188346579, 1385008.5401600213, 514677.038573372, 387195.56502804917, 281164.65362325957, 1512307.8891047493, 2114204.697920214, 1280391.7056360755, 1650660.0594245053, 554096.482085637], [666313.7580419029, 1365981.2428742633, 2011095.455319733, 453217.29083790665, 1199981.2283586136, 358852.32104592584, 375855.4012532809, 311436.16701894277, 2033000.776565753, 2418152.391663846, 847661.841421182, 926486.0374297247], [593030.0669844414, 121955.63569302124, 124121.99904933537, 697146.7418886195, 1321002.514808584, 743093.1371151333, 493712.52017493406, 767889.8563902564, 487050.6874229272, 641935.1621667973, 310387.14691965195, 246026.0999929544] ]
such difference causes diff results in all calculations involving the facespace
dn _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Hi all,
How are you using the values? How significant are the differences?
i am using these eigenvectors to do PCA on a set of images(of faces).I sort the eigenvectors in descending order of their eigenvalues and this is multiplied with the (orig data of some images viz a matrix)to obtain a facespace.
I've dealt with similar issues a lot  performing PCA on data where the dimensionality of the data is much greater than the number of data points. (Like images.)
In this case, the maximum number of nontrivial eigenvectors of the covariance matrix of the data is min(dimension_of_data, number_of_data_points), so one always runs into the zeroeigenvalue problem; the matrix is thus always illconditioned, but that's not a problem in these cases.
Nevertheless, if you've got (say) 100 images that are each 100x100 pixels, to do PCA in the naive way you need to make a 10000x10000 covariance matrix and then decompose it into 100000 eigenvectors and values just to get out the 100 nontrivial ones. That's a lot of computation wasted calculating noise! Fortunately, there are better ways. One is to perform the SVD on the 100x10000 data matrix. Let the centered (meansubtracted) data matrix be D, then the SVD provides matrices U, S, and V'. IIRC, the eigenvalues of D'D (the covariance matrix of interest) are then packed along the first dimension of V', and the eigenvalues are the square of the values in S.
But! There's an even faster way (from my testing). The trick is that instead of calculating the 10000x10000 outer covariance matrix D'D, or doing the SVD on D, one can calculate the 100x100 "inner" covariance matrix DD' and perform the eigendecomposition thereon and then trivially transform those eigenvalues and vectors to the ones of the D'D matrix. This computation is often substantially faster than the SVD.
Here's how it works: Let D, our recentered data matrix, be of shape (n, k)  that is, n data points in k dimensions. We know that D has a singular value decomposition D = USV' (no need to calculate the SVD though; just enough to know it exists). From this, we can rewrite the covariance matrices: D'D = VS'SV' DD' = USS'U'
Now, from the SVD, we know that S'S and SS' are diagonal matrices, and V and U (and V' and U') form orthogonal bases. One way to write the eigendecomposition of a matrix is A = QLQ', where Q is orthogonal and L is diagonal. Since the eigendecomposition is unique (up to a permutation of the columns of Q and L), we know that V must therefore contain the eigenvectors of D'D in its columns, and U must contain the eigenvectors of DD' in its columns. This is the origin of the SVD recipe for that I gave above.
Further, let S_hat, of shape (n, k) be the elementwise reciprocal of S (i.e. SS_hat = I of shape (m, n) and S_hatS = I of shape (n, m), where I is the identity matrix). Then, we can solve for U or V in terms of the other: V = D'US_hat' U = DVS_hat
So, to get the eigenvectors and eigenvalues of D'D, we just calculate DD' and then apply the symmetric eigendecomposition (symmetric version is faster, and DD' is symmetric) to get eigenvectors U and eigenvalues L. We know that L=SS', so S_hat = 1/sqrt(L) (where the sqrt is taken elementwise, of course). So, the eigenvectors we're looking for are: V = D'US_hat Then, the principal components (eigenvectors) are in the columns of V (packed along the second dimension of V).
Fortunately, I've packaged this all up into a python module for PCA that takes care of this all. It's attached.
Zach Pincus
Postdoctoral Fellow, Lab of Dr. Frank Slack Molecular, Cellular and Developmental Biology Yale University
participants (6)

Charles R Harris

devnew＠gmail.com

Javier Maria Torres

Matthieu Brucher

Warren Focke

Zachary Pincus