[Python-ideas] Fast sum() for non-numbers - why so much worries?

Thu Jul 11 19:19:49 CEST 2013

On 11 July 2013 17:45, Andrew Barnert <abarnert at yahoo.com> wrote:
> On Jul 11, 2013, at 2:07, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>
>> Numpy arrays treat += differently from + in the sense that a += b
>> coerces b to the same dtype as a and then adds in place whereas a + b
>> uses Python style type promotion. This behaviour is by design and it
>> is useful. It is also entirely appropriate (and not pathological) that
>> someone would use sum() to add numpy arrays.
>
> I forgot about this. I was positive on the first patch (+ first, then += for the rest) mainly because it speeds up sum for numpy.

Only by a constant factor. Summing numpy arrays with sum is O(N)
either way. If someone wants to speed that up they can use numpy to do
so i.e.:

total = np.zeros(shape, dtype=float)
for a in arrays:
    total += a

is not significantly slower than sum(arrays) if the arrays themselves are large.

> You probably won't _often_ sum arrays of different dtypes... But if you do, you certainly don't want the result to have the dtype resulting from just coercing start.dtype and iter[0].dtype.

It can easily happen:

import numpy as np
initial_velocity = np.array([1, 1, 1])  # Implicitly create an int array
velocities = [initial_velocity]
for n in range(1000):
    velocities.append(0.9 * velocities[-1]) # Append float arrays
final_position = delta_t * sum(velocities)

With the proposed patch all 1000 arrays after the first would count as
zero in the final result so that the answer would be (delta_t *
array([1, 1, 1])) instead of  (delta_t * array([10.0, 10.0, 10.0]))

> Of course this could be marked as a caveat for numpy--pass a scalar or array of the right dtype for start, and you get the right answer, after all.

I don't think it's acceptable to pass off a backward incompatible
change of this nature in a minor release. It's the worst kind of
change since there's no DeprecationWarning, no TypeError, just code
that silently produces the wrong result. The change might be small in
some cases (and so not immediately obvious) but then massive in
others.

Oscar