[Numpy-discussion] odd performance of sum?
Pauli Virtanen
pav at iki.fi
Thu Feb 10 13:31:39 EST 2011
Thu, 10 Feb 2011 12:16:12 -0600, Robert Kern wrote:
[clip]
> One thing that might be worthwhile is to make
> implementations of sum() and cumsum() that avoid the ufunc machinery and
> do their iterations more quickly, at least for some common combinations
> of dtype and contiguity.
I wonder what is the balance between the iterator overhead and the time
taken in the reduction inner loop. This should be straightforward to
benchmark.
Apparently, some overhead decreased with the new iterators, since current
Numpy master outperforms 1.5.1 by a factor of 2 for this benchmark:
In [8]: %timeit M.sum(1) # Numpy 1.5.1
10 loops, best of 3: 85 ms per loop
In [8]: %timeit M.sum(1) # Numpy master
10 loops, best of 3: 49.5 ms per loop
I don't think this is explainable by the new memory layout optimizations,
since M is C-contiguous.
Perhaps there would be room for more optimization, even within the ufunc
framework?
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list