[SciPy-User] PCA functions

josef.pktd at gmail.com josef.pktd at gmail.com
Thu May 20 11:21:22 EDT 2010


On Thu, May 20, 2010 at 9:38 AM, Vincent Davis <vincent at vincentdavis.net>wrote:

> On Thu, May 20, 2010 at 3:35 AM, Oliver Tomic <oliver.tomic at nofima.no>wrote:
>
> @Oliver, I posted your email over on the statsmodels list. I'll take a look
> at the link.
>

I briefly looked at the pca_module and it looks well written and documented
already, although I think the matrix versions are redundant based on very
fast skimming of the code and list of functions.

statsmodels already has 3 implementations of pca, with eigh, svd and one
wrapped in a class. And there is also an example how to do Principal
Component Regression. But we don't have NIPALS yet, or a version that
calculates only a few eigenvectors (with eigh). And the current versions in
statsmodels are pretty basic, the eigh and svd versions are modeled after
and tested against matlab princomp.

I think, as the discussions on the nipy and scipy list show, a basic PCA
version is easy to write, but everyone emphasizes different extras or
performance features, e.g. rotation would be nice for factor analysis.

I also think that scipy should have a basic version, just so we don't have
to figure out or remember how to do eigh or what all the different parts of
svd mean.

For statsmodels, I looked at this mainly for regressions in a "data-rich
environment", i.e. with lots of possible regressors.
For (unsupervised) dimension reduction we still have to figure out how it
fits in when pca gets out of the sandbox or when we expand in this area.
Also, I don't know if statsmodels will eventually get factor analysis. (I
have a multivariate analysis folder on my computer, but thought of leaving
this area to pymvpa.)
I stopped working on this for the moment, but I thought maybe a class that
makes the usage of pca and the corresponding projections easy and
self-explanatory would be useful. E.g. for regression we need to be able to
rerun the regression with an increasing number of components and should
reuse previous calculations. The second point, if there are different
implementations, then we should have either automatic selection of the best
one given the arguments or a comparative documentation when to use which
version.

Josef

http://tinyurl.com/2dwyjt8




>
> Vincent
>
>
>> Hi,
>>
>> I already sent this link to Mike (off-list, since my mails kept bouncing
>> back). A while ago I supervised a student who implemented various flavours
>> of PCA (using SVD and NIPALS, in Python and C respectively) as part of a
>> semester project. There is quite a bit of documentation coming with the PCA
>> module.
>>
>> http://folk.uio.no/henninri/pca_module/
>>
>>
>> I was considering to ask the pystatmodels-group whether they are
>> interested in including this code, however both code and documentation may
>> need a little bit of polishing first. Unfortunately, there is no validation
>> procedure available in the code to validate the model. I have plans on
>> implementing this if I ever should find some time to do this.
>>
>
>> Cheers
>> Oliver
>>
>>
>>
>>
>>
>> -----scipy-user-bounces at scipy.org wrote: -----
>>
>> >To: SciPy Users List <scipy-user at scipy.org>
>> >From: Sean Arms <lesserwhirls at gmail.com>
>> >Sent by: scipy-user-bounces at scipy.org
>> >Date: 05/19/2010 06:04PM
>> >Subject: Re: [SciPy-User] PCA functions
>>
>> >
>> >Greetings Mike,
>> >
>> >     Are you looking for just the PCA decomposition, or are you
>> >wanting to rotate the truncated PC's using something like promax,
>> >varimax, etc.?  If so, I do not think MDP or NiPy have that
>> >capability.  I have functions to do some of the basic rotations, and
>> >I've tested them against S+ and Matlab if you are looking for that
>> >functionality, but I'll probably need to clean them up a bit :-)
>> >
>> >Sean
>> >
>> >On Wed, May 19, 2010 at 9:53 AM, Michael Hull
>> ><mikehulluk at googlemail.com> wrote:
>> >> Hi Everybody,
>> >> I am doing some work using numpy/scipy and wanted to find the
>> >> principle components for some data. I can write a fairly simple
>> >> function to do this, but was wondering if there was already a
>> >function
>> >> in scipy to do this that I hadn't found before re-inventing the
>> >wheel
>> >>
>> >> Many thanks,
>> >>
>> >>
>> >> Mike Hull
>> >> _______________________________________________
>> >> SciPy-User mailing list
>> >> SciPy-User at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-user
>> >>
>> >_______________________________________________
>> >SciPy-User mailing list
>> >SciPy-User at scipy.org
>> >http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>   *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
>  my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100520/52e074c7/attachment.html>


More information about the SciPy-User mailing list