[Numpy-discussion] Sum, multiply are slow ?

David Cournapeau david at ar.media.kyoto-u.ac.jp
Thu Jul 12 02:33:03 EDT 2007

Travis Oliphant wrote:
> David Cournapeau wrote:
>> Hi,
>>     While profiling some code, I noticed that sum in numpy is kind of 
>> slow once you use axis argument:
> Yes, this is expected because when using an access argument, the 
> following two things can happen
> 1) You may be skipping over large chunks of memory to get to the next 
> available number and out-of-cache memory access is slow.
> 2) You have to allocate a result array.
>> import numpy as N
>> a = N.random.randn(1e5, 30)
>> %timeit N.sum(a) #-> 26.8ms
>> %timeit N.sum(a, 1) #-> 65.5ms
>> %timeit N.sum(a, 0) #-> 141ms
>> Now, if I use some tricks, I get:
>> %timeit N.sum(a) #-> 26.8 ms
>> %timeit N.dot(a, N.ones(a.shape[1], a.dtype)) #-> 11.3ms
>> %timeit N.dot(N.ones((1, a.shape[0]), a.dtype), a) #-> 15.5ms
>> I realize that dot uses optimized libraries (atlas in my case) and all, 
>> but is there any way to improve this situation ?
> Sum does *not* use an optimized library so it is not too surprising that 
> you can get speed-ups using ATLAS. 
I understand that there is no optimization going on with sum or 
multiply. This was just to have a comparison (this kind of things varies 
*a lot* accross CPU of the same architecture).
>  It would be nice to do something to 
> optimize the reduction functions in NumPy, but nobody has come forward 
> with suggestions yet.
So this is possible to improve things ? I noticed that sum/multiply and 
co are using reduction functions. Should I follow the same scheme than 
what I did for clip (following dot related optimization, basically) ?


More information about the NumPy-Discussion mailing list