Sum, multiply are slow ?

Hi, While profiling some code, I noticed that sum in numpy is kind of slow once you use axis argument: import numpy as N a = N.random.randn(1e5, 30) %timeit N.sum(a) #-> 26.8ms %timeit N.sum(a, 1) #-> 65.5ms %timeit N.sum(a, 0) #-> 141ms Now, if I use some tricks, I get: %timeit N.sum(a) #-> 26.8 ms %timeit N.dot(a, N.ones(a.shape[1], a.dtype)) #-> 11.3ms %timeit N.dot(N.ones((1, a.shape[0]), a.dtype), a) #-> 15.5ms I realize that dot uses optimized libraries (atlas in my case) and all, but is there any way to improve this situation ? Cheers, David

very interesting, however, it would be better if you provide exact code. I didn't use timeit and I have some troubles with the module. Regards, D. David Cournapeau wrote:
Hi,
While profiling some code, I noticed that sum in numpy is kind of slow once you use axis argument:
import numpy as N a = N.random.randn(1e5, 30) %timeit N.sum(a) #-> 26.8ms %timeit N.sum(a, 1) #-> 65.5ms %timeit N.sum(a, 0) #-> 141ms
Now, if I use some tricks, I get:
%timeit N.sum(a) #-> 26.8 ms %timeit N.dot(a, N.ones(a.shape[1], a.dtype)) #-> 11.3ms %timeit N.dot(N.ones((1, a.shape[0]), a.dtype), a) #-> 15.5ms
I realize that dot uses optimized libraries (atlas in my case) and all, but is there any way to improve this situation ?
Cheers,
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

David Cournapeau wrote:
Hi,
While profiling some code, I noticed that sum in numpy is kind of slow once you use axis argument:
Yes, this is expected because when using an access argument, the following two things can happen 1) You may be skipping over large chunks of memory to get to the next available number and out-of-cache memory access is slow. 2) You have to allocate a result array.
import numpy as N a = N.random.randn(1e5, 30) %timeit N.sum(a) #-> 26.8ms %timeit N.sum(a, 1) #-> 65.5ms %timeit N.sum(a, 0) #-> 141ms
Now, if I use some tricks, I get:
%timeit N.sum(a) #-> 26.8 ms %timeit N.dot(a, N.ones(a.shape[1], a.dtype)) #-> 11.3ms %timeit N.dot(N.ones((1, a.shape[0]), a.dtype), a) #-> 15.5ms
I realize that dot uses optimized libraries (atlas in my case) and all, but is there any way to improve this situation ?
Sum does *not* use an optimized library so it is not too surprising that you can get speed-ups using ATLAS. It would be nice to do something to optimize the reduction functions in NumPy, but nobody has come forward with suggestions yet. Thanks for the reports, though. -Travis

Travis Oliphant wrote:
David Cournapeau wrote:
Hi,
While profiling some code, I noticed that sum in numpy is kind of slow once you use axis argument:
Yes, this is expected because when using an access argument, the following two things can happen
1) You may be skipping over large chunks of memory to get to the next available number and out-of-cache memory access is slow.
2) You have to allocate a result array.
import numpy as N a = N.random.randn(1e5, 30) %timeit N.sum(a) #-> 26.8ms %timeit N.sum(a, 1) #-> 65.5ms %timeit N.sum(a, 0) #-> 141ms
Now, if I use some tricks, I get:
%timeit N.sum(a) #-> 26.8 ms %timeit N.dot(a, N.ones(a.shape[1], a.dtype)) #-> 11.3ms %timeit N.dot(N.ones((1, a.shape[0]), a.dtype), a) #-> 15.5ms
I realize that dot uses optimized libraries (atlas in my case) and all, but is there any way to improve this situation ?
Sum does *not* use an optimized library so it is not too surprising that you can get speed-ups using ATLAS. I understand that there is no optimization going on with sum or multiply. This was just to have a comparison (this kind of things varies *a lot* accross CPU of the same architecture). It would be nice to do something to optimize the reduction functions in NumPy, but nobody has come forward with suggestions yet.
So this is possible to improve things ? I noticed that sum/multiply and co are using reduction functions. Should I follow the same scheme than what I did for clip (following dot related optimization, basically) ? David

On 7/12/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
While profiling some code, I noticed that sum in numpy is kind of slow once you use axis argument:
Here is a related thread: http://projects.scipy.org/pipermail/numpy-discussion/2007-February/025903.ht...

Keith Goodman wrote:
On 7/12/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
While profiling some code, I noticed that sum in numpy is kind of slow once you use axis argument:
Here is a related thread: http://projects.scipy.org/pipermail/numpy-discussion/2007-February/025903.ht...
Thanks, I remembered there was something about that a few months ago, could not find it. By quickly looking at the code for PyArray_Sum, this looks like it has nothing to do with caching or looping, but the way that summing is implemented (generic reduce). David
participants (4)
-
David Cournapeau
-
dmitrey
-
Keith Goodman
-
Travis Oliphant