[Numpy-discussion] odd performance of sum?

Thu Feb 10 13:31:39 EST 2011

Thu, 10 Feb 2011 12:16:12 -0600, Robert Kern wrote:
[clip]
> One thing that might be worthwhile is to make
> implementations of sum() and cumsum() that avoid the ufunc machinery and
> do their iterations more quickly, at least for some common combinations
> of dtype and contiguity.

I wonder what is the balance between the iterator overhead and the time 
taken in the reduction inner loop. This should be straightforward to 
benchmark.

Apparently, some overhead decreased with the new iterators, since current 
Numpy master outperforms 1.5.1 by a factor of 2 for this benchmark:

In [8]: %timeit M.sum(1)     # Numpy 1.5.1
10 loops, best of 3: 85 ms per loop

In [8]: %timeit M.sum(1)     # Numpy master
10 loops, best of 3: 49.5 ms per loop

I don't think this is explainable by the new memory layout optimizations, 
since M is C-contiguous.

Perhaps there would be room for more optimization, even within the ufunc 
framework?

-- 
Pauli Virtanen