On 11/15/2011 05:46 PM, Andreas Müller wrote:
On 11/15/2011 04:28 PM, Bruce Southey wrote:
On 11/14/2011 10:05 AM, Andreas Müller wrote:
On 11/14/2011 04:23 PM, David Cournapeau wrote:
On Mon, Nov 14, 2011 at 12:46 PM, Andreas Müller
<amueller@ais.uni-bonn.de>  wrote:
Hi everybody.
When I did some normalization using numpy, I noticed that numpy.std uses
more ram than I was expecting.
A quick google search gave me this:
http://luispedro.org/software/ncreduce
The site claims that std and other reduce operations are implemented
naively with many temporaries.
Is that true? And if so, is there a particular reason for that?
This issues seems quite easy to fix.
In particular the link I gave above provides code.
The code provided only implements a few special cases: being more
efficient in those cases only is indeed easy.
I am particularly interested in the std function.
Is this implemented as a separate function or an instantiation
of a general reduce operations?

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
The 'On-line algorithm' (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm) could save you storage. I would presume if you know cython that you can probably make it quick as well (to address the loop over the data).


My question was more along the lines of "why doesn't numpy do the online algorithm".

To be more precise, even not using the online version but computing E(X^2) and E(X)^2 would be good.
It seems numpy centers the whole dataset. Otherwise I can't explain why the memory needed should depend
on the number of examples.