Hi everyone, I’ve started work on getting image moments working in nD. I have two working alternative implementations: one with NumPy broadcasting and one with Numba JIT. Both are consistent with each other and with the existing Cython implementation, with to caveats: 1- my results are the *transpose* of the reference results. I suspected this might be because the reference uses x/y coordinates but that doesn’t seem to be the case. (see code below). 2- each of my implementations is ~10x slower than the Cython implementation. Of these, problem 1 is the most pressing. (Problem 2 can be solved by special-casing 2D.) Can someone familiar with moments compare my implementations and the reference and tell me where I might have gone wrong? I’ve included all the code below for reference. Ideas about speedups are also welcome! To that end, I’ve marked the slow lines in the broadcasting code. (I don’t know how to line profile in Numba; I think it’s currently impossible, in fact.) Juan. ————————— Reference results ———————————— Values (Notice the .T for `moments_central`, which is the existing Cython implementation) In [189]: m.moments_central_nb(horse.astype(float), central_coords, 3) Out[189]: array([[ 8.779e+04, -1.621e-07, 1.301e+09, -6.370e+09], [ -8.245e-07, 9.275e+07, -3.430e+09, 1.938e+12], [ 9.873e+08, -1.363e+10, 1.349e+13, -3.604e+14], [ -2.218e+10, 1.552e+12, -2.849e+14, 3.245e+16]]) In [190]: moments_central(horse.astype(float), *central_coords, 3).T Out[190]: array([[ 8.779e+04, -1.621e-07, 1.301e+09, -6.370e+09], [ -8.245e-07, 9.275e+07, -3.430e+09, 1.938e+12], [ 9.873e+08, -1.363e+10, 1.349e+13, -3.604e+14], [ -2.218e+10, 1.552e+12, -2.849e+14, 3.245e+16]]) In [191]: m.moments_central(horse.astype(float), central_coords, 3) Out[191]: array([[ 8.779e+04, -1.037e-09, 1.301e+09, -6.370e+09], [ -1.979e-09, 9.275e+07, -3.430e+09, 1.938e+12], [ 9.873e+08, -1.363e+10, 1.349e+13, -3.604e+14], [ -2.218e+10, 1.552e+12, -2.849e+14, 3.245e+16]]) ———————————— Timings In [185]: %timeit moments_central(horse.astype(float), *central_coords, 3) 100 loops, best of 3: 3.14 ms per loop In [187]: %timeit m.moments_central_nb(horse.astype(float), central_coords, 3) 10 loops, best of 3: 35.7 ms per loop In [188]: %timeit m.moments_central(horse.astype(float), central_coords, 3) 10 loops, best of 3: 39.4 ms per loop ————————— Implementations ———————————— Existing Cython cpdef moments_central(image_t image, double cr, double cc, Py_ssize_t order): cdef Py_ssize_t p, q, r, c cdef double[:, ::1] mu = np.zeros((order + 1, order + 1), dtype=np.double) cdef double val, dr, dc, dcp, drq for r in range(image.shape[0]): dr = r - cr for c in range(image.shape[1]): dc = c - cc val = image[r, c] dcp = 1 for p in range(order + 1): drq = 1 for q in range(order + 1): mu[p, q] += val * drq * dcp drq *= dr dcp *= dc return np.asarray(mu) ———————————— Broadcasting import numpy as np import numba from functools import reduce def moments_central(image, center=None, order=3): coords = np.nonzero(image) values = image[coords] coords_stack = np.column_stack(np.nonzero(image)) if center is None: center = np.mean(coords_stack, axis=0) ccoords = coords_stack - center powers = np.arange(order + 1) result_shape = (order + 1,) * image.ndim + (coords_stack.shape[0],) coords_pow = [] for dim in range(image.ndim): power_shape = [1,] * (image.ndim + 1) power_shape[dim] = order + 1 powers = np.reshape(powers, power_shape) ccoords_pow = ccoords[:, dim] ** powers # SLOW, 50% of time bcast_coords_pow = np.broadcast_to(ccoords_pow, result_shape) coords_pow.append(bcast_coords_pow) result = np.sum(reduce(np.multiply, coords_pow + [values]), axis=-1) # SLOW, 30% of time return result ———————————— Numba def moments_central_nb(image, center=None, order=3): mu = np.zeros((order + 1,) * image.ndim) if center is None: center = np.mean(np.column_stack(np.nonzero(image)), axis=0) _moments_central_nb_loop(image, center, mu) return mu @numba.njit def _moments_central_nb_loop(image, center, mu): for coord, val in np.ndenumerate(image): if val > 0: for powers in np.ndindex(mu.shape): curval = val for c, ctr, p in zip(coord, center, powers): dc = c - ctr cr = 1 for pcount in range(p): cr *= dc curval *= cr mu[powers] += curval