[Numpy-discussion] Memory hungry reduce ops in Numpy

Warren Weckesser warren.weckesser at enthought.com
Tue Nov 15 12:02:06 EST 2011

On Tue, Nov 15, 2011 at 10:48 AM, Andreas Müller
<amueller at ais.uni-bonn.de>wrote:

> **
> On 11/15/2011 05:46 PM, Andreas Müller wrote:
> On 11/15/2011 04:28 PM, Bruce Southey wrote:
> On 11/14/2011 10:05 AM, Andreas Müller wrote:
> On 11/14/2011 04:23 PM, David Cournapeau wrote:
>  On Mon, Nov 14, 2011 at 12:46 PM, Andreas Müller<amueller at ais.uni-bonn.de> <amueller at ais.uni-bonn.de>  wrote:
>  Hi everybody.
> When I did some normalization using numpy, I noticed that numpy.std uses
> more ram than I was expecting.
> A quick google search gave me this:http://luispedro.org/software/ncreduce
> The site claims that std and other reduce operations are implemented
> naively with many temporaries.
> Is that true? And if so, is there a particular reason for that?
> This issues seems quite easy to fix.
> In particular the link I gave above provides code.
>  The code provided only implements a few special cases: being more
> efficient in those cases only is indeed easy.
>  I am particularly interested in the std function.
> Is this implemented as a separate function or an instantiation
> of a general reduce operations?
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>  The 'On-line algorithm' (
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm)<http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm>could save you storage. I would presume if you know cython that you can
> probably make it quick as well (to address the loop over the data).
> My question was more along the lines of "why doesn't numpy do the online
> algorithm".
>  To be more precise, even not using the online version but computing
> E(X^2) and E(X)^2 would be good.
> It seems numpy centers the whole dataset. Otherwise I can't explain why
> the memory needed should depend
> on the number of examples.

Yes, that is what it is doing.   See line 63 in the function _var(), which
is called by _std():


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111115/0ed55862/attachment.html>

More information about the NumPy-Discussion mailing list