Re: [scikit-image] Help with making moments nD
Hi Jaime, Thanks for an outrageously thorough response! Firstly, I totally missed that the dc and dr multipliers were in a different order! Looks like we will have to do some deprecation! Second, my own timings match yours, even for the 2D horse image. I’ve renamed my function moments_central_np0 and yours _np[1-3]. Plain moments_central is the skimage implementation. In [216]: %timeit moments_central(horse, *central_coords, 3) 100 loops, best of 3: 3.11 ms per loop In [217]: %timeit m.moments_central_np0(horse, central_coords, 3) 10 loops, best of 3: 39.2 ms per loop In [218]: %timeit m.moments_central_np1(horse, central_coords, 3) 100 loops, best of 3: 6.49 ms per loop In [219]: %timeit m.moments_central_np2(horse, central_coords, 3) 100 loops, best of 3: 5.28 ms per loop In [220]: %timeit m.moments_central_np3(horse, central_coords, 3) The slowest run took 5.99 times longer than the fastest. This could mean that an intermediate result is being cached. 1000 loops, best of 3: 375 µs per loop In [221]: %timeit m.moments_central_np3(horse, central_coords, 3) 1000 loops, best of 3: 396 µs per loop The only thing I’m concerned about is that “intermediate result being cached”. Any idea why that could be? It only seems to happen with the third implementation…! Either way, even the slowest time is faster than the other functions! Anyway, thanks so much for your help! Juan. On 7 Apr 2017, 9:22 PM +1000, Jaime Fernández del Río <jaime.frio@gmail.com>, wrote:
Hi Juan,
The transposition is almost certainly due to the fact that in the Cython implementation the p loop, which loops rows of the moments array, uses the dc multiplier, which comes from the image columns, while the q loop, which loops columns of the moments array, uses the dr multiplier, which comes from the image rows. Your other code chooses the multiplier corresponding to the same axes in both the image and the moments arrays. It is not obvious to me how to extend the current behavior to higher dimensions.
As for your implementation, using np.nonzero is probably not going to pay off unless your image is very sparse, because you basically go from having to handle a single value per image pixel, to add an extra coordinate value for each dimension. And you don't really need to compute an offset for each pixel, since many pixels share the offsets along some dimensions, e.g. for a 100 x 100 x 100 3D image, this code is 2-3x faster:
import itertools
def moments_central(image, center=None, order=3): image = np.asarray(image) if center is None: center = np.array(image.shape, dtype=float) / 2 center = np.asarray(center) assert center.ndim == 1 and center.size == image.ndim # deltas is a list of arrays holding the deltas from the center along each # dimension of the image. deltas = [np.arange(image.shape[dim], dtype=float) - center[dim] for dim in range(image.ndim)] # Reshape each delta so that it can properly broadcast against image. deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim - 1)) for dim, delta in enumerate(deltas)] moments = np.empty((order + 1,) * image.ndim, dtype=float) reusable_buffer = np.empty_like(image, dtype=float) for powers in itertools.product(range(order + 1), repeat=image.ndim): reusable_buffer[:] = image for power, delta in zip(powers, deltas): reusable_buffer *= delta ** power moments[tuple(powers)] = reusable_buffer.sum()
return moments
Notice that this just iterates over the moments items. Broadcasting in this type of problems is dangerous, because you basically replace an N item loop with creating an arrays N times larger than your original data. And Python loops are slow, but not THAT slow. You can actually make the above idea run even faster, almost 2x, by reducing the dimensionality of your result as soon as possible earlier:
import itertools
def moments_central_faster(image, center=None, order=3): image = np.asarray(image) if center is None: center = np.array(image.shape, dtype=float) / 2 center = np.asarray(center) assert center.ndim == 1 and center.size == image.ndim # deltas is a list of arrays holding the deltas from the center along each # dimension of the image. deltas = [np.arange(image.shape[dim], dtype=float) - center[dim] for dim in range(image.ndim)] # Reshape each delta so that it can properly broadcast against image. deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim - 1)) for dim, delta in enumerate(deltas)] moments = np.empty((order + 1,) * image.ndim, dtype=float) for powers in itertools.product(range(order + 1), repeat=image.ndim): calc = image.copy() for dim, (power, delta) in enumerate(zip(powers, deltas)): calc *= delta ** power calc = np.add.reduce(calc, axis=dim, keepdims=True) moments[tuple(powers)] = calc.sum()
return moments
This idea of reducing dimensionality can be given a twist in the form of stacked matrix multiplication: one can think of the whole calculation as reducing each dimension of size D to O, where O is the order plus one, by multiplying by a DXO matrix, holding the powers of the deltas along that dimension. In code it could look something like this:
def moments_central_even_faster(image, center=None, order=3): image = np.asarray(image) if center is None: center = np.array(image.shape, dtype=float) / 2 center = np.asarray(center) assert center.ndim == 1 and center.size == image.ndim # deltas is a list of arrays holding the deltas from the center along each # dimension of the image. deltas = [np.arange(image.shape[dim], dtype=float) - center[dim] for dim in range(image.ndim)] calc = image for dim, delta in enumerate(deltas): powers_of_delta = delta[:, None] ** np.arange(order + 1) calc = np.rollaxis(calc, dim, image.ndim) calc = np.dot(calc, powers_of_delta) calc = np.rollaxis(calc, -1, dim)
return calc
This turns out to be ridiculously fast for the 100 x 100 x 100 image case (1.41 s vs 520 ms vs 258 ms vs 10.5 ms), so over 100x (!!!) faster than your implementation.
On a 1000 x 1000 2D image it's fast, but not the fastest (571 ms vs 122 ms vs 92.8 ms vs 113 ms), but the fastest version is probably in the same ballpark as the current Cython implementation.
Let me know how do your timings compare.
Jaime
On Fri, Apr 7, 2017 at 9:16 AM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Hi everyone,
I’ve started work on getting image moments working in nD. I have two working alternative implementations: one with NumPy broadcasting and one with Numba JIT. Both are consistent with each other and with the existing Cython implementation, with to caveats:
1- my results are the *transpose* of the reference results. I suspected this might be because the reference uses x/y coordinates but that doesn’t seem to be the case. (see code below).
2- each of my implementations is ~10x slower than the Cython implementation.
Of these, problem 1 is the most pressing. (Problem 2 can be solved by special-casing 2D.) Can someone familiar with moments compare my implementations and the reference and tell me where I might have gone wrong? I’ve included all the code below for reference.
Ideas about speedups are also welcome! To that end, I’ve marked the slow lines in the broadcasting code. (I don’t know how to line profile in Numba; I think it’s currently impossible, in fact.)
Juan.
————————— Reference results
———————————— Values
(Notice the .T for `moments_central`, which is the existing Cython implementation)
In [189]: m.moments_central_nb(horse.astype(float), central_coords, 3) Out[189]: array([[ 8.779e+04, -1.621e-07, 1.301e+09, -6.370e+09], [ -8.245e-07, 9.275e+07, -3.430e+09, 1.938e+12], [ 9.873e+08, -1.363e+10, 1.349e+13, -3.604e+14], [ -2.218e+10, 1.552e+12, -2.849e+14, 3.245e+16]])
In [190]: moments_central(horse.astype(float), *central_coords, 3).T Out[190]: array([[ 8.779e+04, -1.621e-07, 1.301e+09, -6.370e+09], [ -8.245e-07, 9.275e+07, -3.430e+09, 1.938e+12], [ 9.873e+08, -1.363e+10, 1.349e+13, -3.604e+14], [ -2.218e+10, 1.552e+12, -2.849e+14, 3.245e+16]])
In [191]: m.moments_central(horse.astype(float), central_coords, 3) Out[191]: array([[ 8.779e+04, -1.037e-09, 1.301e+09, -6.370e+09], [ -1.979e-09, 9.275e+07, -3.430e+09, 1.938e+12], [ 9.873e+08, -1.363e+10, 1.349e+13, -3.604e+14], [ -2.218e+10, 1.552e+12, -2.849e+14, 3.245e+16]])
———————————— Timings
In [185]: %timeit moments_central(horse.astype(float), *central_coords, 3) 100 loops, best of 3: 3.14 ms per loop
In [187]: %timeit m.moments_central_nb(horse.astype(float), central_coords, 3) 10 loops, best of 3: 35.7 ms per loop
In [188]: %timeit m.moments_central(horse.astype(float), central_coords, 3) 10 loops, best of 3: 39.4 ms per loop
————————— Implementations
———————————— Existing Cython cpdef moments_central(image_t image, double cr, double cc, Py_ssize_t order): cdef Py_ssize_t p, q, r, c cdef double[:, ::1] mu = np.zeros((order + 1, order + 1), dtype=np.double) cdef double val, dr, dc, dcp, drq for r in range(image.shape[0]): dr = r - cr for c in range(image.shape[1]): dc = c - cc val = image[r, c] dcp = 1 for p in range(order + 1): drq = 1 for q in range(order + 1): mu[p, q] += val * drq * dcp drq *= dr dcp *= dc return np.asarray(mu)
———————————— Broadcasting import numpy as np import numba from functools import reduce
def moments_central(image, center=None, order=3): coords = np.nonzero(image) values = image[coords] coords_stack = np.column_stack(np.nonzero(image)) if center is None: center = np.mean(coords_stack, axis=0) ccoords = coords_stack - center powers = np.arange(order + 1) result_shape = (order + 1,) * image.ndim + (coords_stack.shape[0],) coords_pow = [] for dim in range(image.ndim): power_shape = [1,] * (image.ndim + 1) power_shape[dim] = order + 1 powers = np.reshape(powers, power_shape) ccoords_pow = ccoords[:, dim] ** powers # SLOW, 50% of time bcast_coords_pow = np.broadcast_to(ccoords_pow, result_shape) coords_pow.append(bcast_coords_pow) result = np.sum(reduce(np.multiply, coords_pow + [values]), axis=-1) # SLOW, 30% of time return result
———————————— Numba def moments_central_nb(image, center=None, order=3): mu = np.zeros((order + 1,) * image.ndim) if center is None: center = np.mean(np.column_stack(np.nonzero(image)), axis=0) _moments_central_nb_loop(image, center, mu) return mu
@numba.njit def _moments_central_nb_loop(image, center, mu): for coord, val in np.ndenumerate(image): if val > 0: for powers in np.ndindex(mu.shape): curval = val for c, ctr, p in zip(coord, center, powers): dc = c - ctr cr = 1 for pcount in range(p): cr *= dc curval *= cr mu[powers] += curval
_______________________________________________ scikit-image mailing list scikit-image@python.org https://mail.python.org/mailman/listinfo/scikit-image
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
participants (1)
-
Juan Nunez-Iglesias