[Numpy-discussion] odd performance of sum?

Pauli Virtanen pav at iki.fi
Fri Feb 11 05:14:36 EST 2011

Thu, 10 Feb 2011 20:49:28 +0000, Pauli Virtanen wrote:
>   1. Check first if the bottleneck is in the inner reduction loop
> (function DOUBLE_add in loops.c.src:712) or in the outer iteration
> (function PyUFunc_ReductionOp in ufunc_object.c:2781).
>  2. If it's in the inner loop, some optimizations are possible, e.g. 
> specialized cases for sizeof(item) strides. Think how to add them
> cleanly.

A quick check (just replace the inner loop with a no-op) shows that for 
100 items, the bottleneck is in the inner loop. The cross-over between 
inner loop time and strided iterator overhead apparently occurs around 
~20-30 items (on the machine I used for testing).

Anyway, spending time for optimizing the inner loop for a 30% speed gain 
(max) seems questionable...

Pauli Virtanen

More information about the NumPy-Discussion mailing list