PCA principal component analysis
s.thuriez at laposte.net
Thu Apr 10 10:42:12 CEST 2003
Alexander Schmolck <a.schmolck at gmx.net> wrote in message news:<yfs7ka3iom2.fsf at black132.ex.ac.uk>...
> s.thuriez at laposte.net (sebastien) writes:
> > Hi,
> > Is there any PCA analysis tools for python ?
> What is the analysis tool supposed to do?
> Maybe this will do what you want, once you (downloaded and installed Numeric):
> # Warning: hackish and not properly tested ripped out bit of code ahead
> # so no guarantees whatsoever
> # Anyway, it should at lesat sort of give you the idea
> # try pca(X); if that doesn't do what you want try pca(t(X))
> from Numeric import take, dot, shape, argsort, where, sqrt, transpose as t
> from LinearAlgebra import eigenvectors
> def pca(M):
> "Perform PCA on M, return eigenvectors and eigenvalues, sorted."
> T, N = shape(M)
> # if there are less rows T than columns N, use
> # snapshot method
> if T < N:
> C = dot(M, t(M))
> evals, evecsC = eigenvectors(C)
> # HACK: make sure evals are all positive
> evals = where(evals < 0, 0, evals)
> evecs = 1./sqrt(evals) * dot(t(M), t(evecsC))
> # calculate covariance matrix
> K = 1./T * dot(t(M), M)
> evals, evecs = eigenvectors(K)
> # sort the eigenvalues and eigenvectors, decending order
> order = (argsort(evals)[::-1])
> evecs = take(evecs, order, 1)
> evals = take(evals, order)
> return evals, t(evecs)
> You can download Numeric and use it to compute the eigenvalues and
> eigenvectors of an array.
> > If it does, do you have any idea on how well it would scale ?
> It should scale fine. If you experience speed problems, configure Numeric 23
> with ATLAS support (you have to install ATLAS and LAPACK first, of course).
> For large matrices, this should be *much* faster than handwritten C code that
> doesn't use ATLAS.
> > I have already seen PyClimate (but it is not available for Windows
> > which will be one of the target). Is there some LAPACK like packages ?
> Yes, Numeric and scipy. (www.numpy.org, www.scipy.org, I should think)
Thanks for the program. I have tested it with a test matrix that I
have and that I took from IDL help (Interactive data language).
The matrix is :
The principal component given by IDL is
((0.87, -0.7, 0.69),
(0.01, -0.64, -0.66),
(0.49, 0.32, -0.30))
Runnnig the same matrix with the program that you have kindly provided
((-0.58, -0.67, 0.45),
(-0.23, -0.40, -0.88),
(-0.77, -0.62, -0.074))
Do you know why is there such a difference ??
I did not have the time yet to test it against other program.
More information about the Python-list