Thanks for this, every little helps. One more thing to mention on this topic. From a certain size dot product becomes faster than sum (due to parallelisation I guess?). E.g. def dotsum(arr): a = arr.reshape(1000, 100) return a.dot(np.ones(100)).sum() a = np.ones(100000) In [45]: %timeit np.add.reduce(a, axis=None) 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [43]: %timeit dotsum(a) 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) But theoretically, sum, should be faster than dot product by a fair bit. Isn’t parallelisation implemented for it? Regards, DG
On 16 Feb 2024, at 01:37, Marten van Kerkwijk <mhvk@astro.utoronto.ca> wrote:
It is more that np.sum checks if there is a .sum() method and if so calls that. And then `ndarray.sum()` calls `np.add.reduce(array)`.