[Numpy-discussion] Memory hungry reduce ops in Numpy
amueller at ais.uni-bonn.de
Tue Nov 15 11:48:44 EST 2011
On 11/15/2011 05:46 PM, Andreas Müller wrote:
> On 11/15/2011 04:28 PM, Bruce Southey wrote:
>> On 11/14/2011 10:05 AM, Andreas Müller wrote:
>>> On 11/14/2011 04:23 PM, David Cournapeau wrote:
>>>> On Mon, Nov 14, 2011 at 12:46 PM, Andreas Müller
>>>> <amueller at ais.uni-bonn.de> wrote:
>>>>> Hi everybody.
>>>>> When I did some normalization using numpy, I noticed that numpy.std uses
>>>>> more ram than I was expecting.
>>>>> A quick google search gave me this:
>>>>> The site claims that std and other reduce operations are implemented
>>>>> naively with many temporaries.
>>>>> Is that true? And if so, is there a particular reason for that?
>>>>> This issues seems quite easy to fix.
>>>>> In particular the link I gave above provides code.
>>>> The code provided only implements a few special cases: being more
>>>> efficient in those cases only is indeed easy.
>>> I am particularly interested in the std function.
>>> Is this implemented as a separate function or an instantiation
>>> of a general reduce operations?
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>> The'On-line algorithm'
>> could save you storage. I would presume if you know cython that you
>> can probably make it quick as well (to address the loop over the data).
> My question was more along the lines of "why doesn't numpy do the
> online algorithm".
To be more precise, even not using the online version but computing
E(X^2) and E(X)^2 would be good.
It seems numpy centers the whole dataset. Otherwise I can't explain why
the memory needed should depend
on the number of examples.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion