[Numpy-discussion] performance matrix multiplication vs. matlab

Gael Varoquaux gael.varoquaux at normalesup.org
Mon Jun 8 03:29:39 EDT 2009

On Mon, Jun 08, 2009 at 08:58:29AM +0200, Matthieu Brucher wrote:
> Given the number of PCs, I think you may just be measuring noise.
> As said in several manifold reduction publications (as the ones by
> Torbjorn Vik who published on robust PCA for medical imaging), you
> cannot expect to have more than 4 or 5 meaningful PCs, due to the
> dimensionality curse. If you want 50 PCs, you have to have at least...
> 10^50 samples, which is quite a lot, let's say it this way.
> According to the litterature, a usual manifold can be described by 4
> or 5 variables. If you have more, it is that you may be infringing
> your hypothesis, here the linearity of your data (and as it is medical
> imaging, you know from the beginning that this hypothesis is wrong).
> So if you really want to find something meaningful and/or physical,
> you should use a real dimensionality reduction, preferably a
> non-linear one.

I am not sure I am following you: I have time-varying signals. I am not
taking a shot of the same process over and over again. My intuition tells
me that I have more than 5 meaningful patterns.

Anyhow, I do some more analysis behind that (ICA actually), and I do find
more than 5 patterns of interest that I not noise.

So maybe I should be using some non-linear dimensionality reduction, but
what I am doing works, and I can write a generative model of it. Most
importantly, it is actually quite computationaly simple.

However, if you can point me to methods that you believe are better (and
tell me why you believe so), I am all ears.


More information about the NumPy-Discussion mailing list