Thanks for this, every little helps.

One more thing to mention on this topic.

From a certain size dot product becomes faster than sum (due to parallelisation I guess?).

E.g.
def dotsum(arr):
    a = arr.reshape(1000, 100)
    return a.dot(np.ones(100)).sum()

a = np.ones(100000)

In [45]: %timeit np.add.reduce(a, axis=None)
42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [43]: %timeit dotsum(a)
26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

But theoretically, sum, should be faster than dot product by a fair bit.

Isn’t parallelisation implemented for it?

Regards,
DG


On 16 Feb 2024, at 01:37, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:

It is more that np.sum checks if there is a .sum() method and if so
calls that.  And then `ndarray.sum()` calls `np.add.reduce(array)`.