PCA principal component analysis

sebastien s.thuriez at laposte.net
Thu Apr 10 10:42:12 CEST 2003


Alexander Schmolck <a.schmolck at gmx.net> wrote in message news:<yfs7ka3iom2.fsf at black132.ex.ac.uk>...
> s.thuriez at laposte.net (sebastien) writes:
> 
> > Hi,
> > 
> > Is there any PCA analysis tools for python ?
> 
> What is the analysis tool supposed to do?
> 
> Maybe this will do what you want, once you (downloaded and installed Numeric):
> 
> # Warning: hackish and not properly tested ripped out bit of code ahead
> # so no guarantees whatsoever
> # Anyway, it should at lesat sort of give you the idea
> # try pca(X); if that doesn't do what you want try pca(t(X))
> 
> from Numeric import take, dot, shape, argsort, where, sqrt, transpose as t
> from LinearAlgebra import eigenvectors
> 
> def pca(M):
>     "Perform PCA on M, return eigenvectors and eigenvalues, sorted."
>     T, N = shape(M)
>     # if there are less rows T than columns N, use
>     # snapshot method
>     if T < N:
>         C = dot(M, t(M))
>         evals, evecsC = eigenvectors(C)
>         # HACK: make sure evals are all positive
>         evals = where(evals < 0, 0, evals)
>         evecs = 1./sqrt(evals) * dot(t(M), t(evecsC))
>     else:
>         # calculate covariance matrix
>         K = 1./T * dot(t(M), M)
>         evals, evecs = eigenvectors(K)
>     # sort the eigenvalues and eigenvectors, decending order
>     order = (argsort(evals)[::-1])
>     evecs = take(evecs, order, 1)
>     evals = take(evals, order)
>     return evals, t(evecs)
> 
> 
> You can download Numeric and use it to compute the eigenvalues and
> eigenvectors of an array.
> 
> 
> > If it does, do you have any idea on how well it would scale ?
> 
> It should scale fine. If you experience speed problems, configure Numeric 23
> with ATLAS support (you have to install ATLAS and LAPACK first, of course).
> For large matrices, this should be *much* faster than handwritten C code that
> doesn't use ATLAS.
> 
> > 
> > I have already seen PyClimate (but it is not available for Windows
> > which will be one of the target). Is there some LAPACK like packages ?
> 
> Yes, Numeric and scipy. (www.numpy.org, www.scipy.org, I should think)
> 
> 'as


HI,

Thanks for the program. I have tested it with a test matrix that I
have and that I took from IDL help (Interactive data language).

The matrix is :

((2,1,3),
(4,2,3),
(4,1,0),
(2,3,3),
(5,1,9))


The principal component given by IDL is 

((0.87, -0.7, 0.69),
(0.01, -0.64, -0.66),
(0.49, 0.32, -0.30))

Runnnig the same matrix with the program that you have kindly provided
gives :

((-0.58, -0.67, 0.45),
(-0.23, -0.40, -0.88),
(-0.77, -0.62, -0.074))

Do you know why is there such a difference ??

I did not have the time yet to test it against other program.

Thanks.

Sebastien.




More information about the Python-list mailing list