[Numpy-discussion] poor performance of sum with sub-machine-word integer types

Tue Jun 21 13:16:19 EDT 2011

On Tue, Jun 21, 2011 at 10:46 AM, Zachary Pincus <zachary.pincus at yale.edu>wrote:

> Hello all,
>
> As a result of the "fast greyscale conversion" thread, I noticed an anomaly
> with numpy.ndararray.sum(): summing along certain axes is much slower with
> sum() than versus doing it explicitly, but only with integer dtypes and when
> the size of the dtype is less than the machine word. I checked in 32-bit and
> 64-bit modes and in both cases only once the dtype got as large as that did
> the speed difference go away. See below...
>
> Is this something to do with numpy or something inexorable about machine /
> memory architecture?
>
>
It's because of the type conversion sum uses by default for greater
precision.

 In [8]: timeit i.sum(axis=-1)
10 loops, best of 3: 140 ms per loop

In [9]: timeit i.sum(axis=-1, dtype=int8)
100 loops, best of 3: 16.2 ms per loop

If you have 1.6, einsum is faster but also conserves type:

In [10]: timeit einsum('ijk->ij', i)
100 loops, best of 3: 5.95 ms per loop

We could probably make better loops for summing within kinds, i.e.,
accumulate in higher precision, then cast to specified precision.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110621/daa1be55/attachment.html>