[Numpy-discussion] performance matrix multiplication vs. matlab

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jun 8 09:02:12 EDT 2009


On Mon, Jun 8, 2009 at 3:29 AM, Gael Varoquaux
<gael.varoquaux at normalesup.org> wrote:
> On Mon, Jun 08, 2009 at 08:58:29AM +0200, Matthieu Brucher wrote:
>> Given the number of PCs, I think you may just be measuring noise.
>> As said in several manifold reduction publications (as the ones by
>> Torbjorn Vik who published on robust PCA for medical imaging), you
>> cannot expect to have more than 4 or 5 meaningful PCs, due to the
>> dimensionality curse. If you want 50 PCs, you have to have at least...
>> 10^50 samples, which is quite a lot, let's say it this way.
>> According to the litterature, a usual manifold can be described by 4
>> or 5 variables. If you have more, it is that you may be infringing
>> your hypothesis, here the linearity of your data (and as it is medical
>> imaging, you know from the beginning that this hypothesis is wrong).
>> So if you really want to find something meaningful and/or physical,
>> you should use a real dimensionality reduction, preferably a
>> non-linear one.
>
> I am not sure I am following you: I have time-varying signals. I am not
> taking a shot of the same process over and over again. My intuition tells
> me that I have more than 5 meaningful patterns.
>
> Anyhow, I do some more analysis behind that (ICA actually), and I do find
> more than 5 patterns of interest that I not noise.

Just curious:
whats the actual shape of the array/data you run your PCA on.
Number of time periods, size of cross section at point in time?

Josef



More information about the NumPy-Discussion mailing list