[Numpy-discussion] odd performance of sum?
Pauli Virtanen
pav at iki.fi
Fri Feb 11 05:14:36 EST 2011
Thu, 10 Feb 2011 20:49:28 +0000, Pauli Virtanen wrote:
[clip]
> 1. Check first if the bottleneck is in the inner reduction loop
> (function DOUBLE_add in loops.c.src:712) or in the outer iteration
> (function PyUFunc_ReductionOp in ufunc_object.c:2781).
> 2. If it's in the inner loop, some optimizations are possible, e.g.
> specialized cases for sizeof(item) strides. Think how to add them
> cleanly.
A quick check (just replace the inner loop with a no-op) shows that for
100 items, the bottleneck is in the inner loop. The cross-over between
inner loop time and strided iterator overhead apparently occurs around
~20-30 items (on the machine I used for testing).
Anyway, spending time for optimizing the inner loop for a 30% speed gain
(max) seems questionable...
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list