[scikit-image] Help with making moments nD

Fri Apr 7 07:22:22 EDT 2017

Hi Juan,

The transposition is almost certainly due to the fact that in the Cython
implementation the p loop, which loops rows of the moments array, uses the
dc multiplier, which comes from the image columns, while the q loop, which
loops columns of the moments array, uses the dr multiplier, which comes
from the image rows. Your other code chooses the multiplier corresponding
to the same axes in both the image and the moments arrays. It is not
obvious to me how to extend the current behavior to higher dimensions.

As for your implementation, using np.nonzero is probably not going to pay
off unless your image is very sparse, because you basically go from having
to handle a single value per image pixel, to add an extra coordinate value
for each dimension. And you don't really need to compute an offset for each
pixel, since many pixels share the offsets along some dimensions, e.g. for
a 100 x 100 x 100 3D image, this code is 2-3x faster:

import itertools

def moments_central(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  # Reshape each delta so that it can properly broadcast against image.
  deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim -
1))
            for dim, delta in enumerate(deltas)]
  moments = np.empty((order + 1,) * image.ndim, dtype=float)
  reusable_buffer = np.empty_like(image, dtype=float)
  for powers in itertools.product(range(order + 1), repeat=image.ndim):
    reusable_buffer[:] = image
    for power, delta in zip(powers, deltas):
      reusable_buffer *= delta ** power
    moments[tuple(powers)] = reusable_buffer.sum()

  return moments

Notice that this just iterates over the moments items. Broadcasting in this
type of problems is dangerous, because you basically replace an N item loop
with creating an arrays N times larger than your original data. And Python
loops are slow, but not THAT slow. You can actually make the above idea run
even faster, almost 2x, by reducing the dimensionality of your result as
soon as possible earlier:

import itertools

def moments_central_faster(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  # Reshape each delta so that it can properly broadcast against image.
  deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim -
1))
            for dim, delta in enumerate(deltas)]
  moments = np.empty((order + 1,) * image.ndim, dtype=float)
  for powers in itertools.product(range(order + 1), repeat=image.ndim):
    calc = image.copy()
    for dim, (power, delta) in enumerate(zip(powers, deltas)):
      calc *= delta ** power
      calc = np.add.reduce(calc, axis=dim, keepdims=True)
    moments[tuple(powers)] = calc.sum()

  return moments

This idea of reducing dimensionality can be given a twist in the form of
stacked matrix multiplication: one can think of the whole calculation as
reducing each dimension of size D to O, where O is the order plus one, by
multiplying by a DXO matrix, holding the powers of the deltas along that
dimension. In code it could look something like this:

def moments_central_even_faster(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  calc = image
  for dim, delta in enumerate(deltas):
    powers_of_delta = delta[:, None] ** np.arange(order + 1)
    calc = np.rollaxis(calc, dim, image.ndim)
    calc = np.dot(calc, powers_of_delta)
    calc = np.rollaxis(calc, -1, dim)

  return calc

This turns out to be ridiculously fast for the 100 x 100 x 100 image case
(1.41 s vs 520 ms vs 258 ms vs 10.5 ms), so over 100x (!!!) faster than
your implementation.

On a 1000 x 1000 2D image it's fast, but not the fastest (571 ms vs 122 ms
vs 92.8 ms vs 113 ms), but the fastest version is probably in the same
ballpark as the current Cython implementation.

Let me know how do your timings compare.

Jaime

On Fri, Apr 7, 2017 at 9:16 AM, Juan Nunez-Iglesias <jni.soma at gmail.com>
wrote:

> Hi everyone,
>
> I’ve started work on getting image moments working in nD. I have two
> working alternative implementations: one with NumPy broadcasting and one
> with Numba JIT. Both are consistent with each other and with the existing
> Cython implementation, with to caveats:
>
> 1- my results are the *transpose* of the reference results. I suspected
> this might be because the reference uses x/y coordinates but that doesn’t
> seem to be the case. (see code below).
>
> 2- each of my implementations is ~10x slower than the Cython
> implementation.
>
> Of these, problem 1 is the most pressing. (Problem 2 can be solved by
> special-casing 2D.) Can someone familiar with moments compare my
> implementations and the reference and tell me where I might have gone
> wrong? I’ve included all the code below for reference.
>
> Ideas about speedups are also welcome! To that end, I’ve marked the slow
> lines in the broadcasting code. (I don’t know how to line profile in Numba;
> I think it’s currently impossible, in fact.)
>
> Juan.
>
> *————————— Reference results*
>
> ———————————— Values
>
> (Notice the .T for `moments_central`, which is the existing Cython
> implementation)
>
> In [189]: m.moments_central_nb(horse.astype(float), central_coords, 3)
> Out[189]:
> array([[  8.779e+04,  -1.621e-07,   1.301e+09,  -6.370e+09],
>        [ -8.245e-07,   9.275e+07,  -3.430e+09,   1.938e+12],
>        [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
>        [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])
>
> In [190]: moments_central(horse.astype(float), *central_coords, 3).T
> Out[190]:
> array([[  8.779e+04,  -1.621e-07,   1.301e+09,  -6.370e+09],
>        [ -8.245e-07,   9.275e+07,  -3.430e+09,   1.938e+12],
>        [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
>        [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])
>
> In [191]: m.moments_central(horse.astype(float), central_coords, 3)
> Out[191]:
> array([[  8.779e+04,  -1.037e-09,   1.301e+09,  -6.370e+09],
>        [ -1.979e-09,   9.275e+07,  -3.430e+09,   1.938e+12],
>        [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
>        [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])
>
> ———————————— Timings
>
> In [185]: %timeit moments_central(horse.astype(float), *central_coords, 3)
> 100 loops, best of 3: 3.14 ms per loop
>
> In [187]: %timeit m.moments_central_nb(horse.astype(float),
> central_coords, 3)
> 10 loops, best of 3: 35.7 ms per loop
>
> In [188]: %timeit m.moments_central(horse.astype(float), central_coords,
> 3)
> 10 loops, best of 3: 39.4 ms per loop
>
> *————————— Implementations*
>
> ———————————— Existing Cython
> cpdef moments_central(image_t image, double cr, double cc, Py_ssize_t
> order):
>     cdef Py_ssize_t p, q, r, c
>     cdef double[:, ::1] mu = np.zeros((order + 1, order + 1),
> dtype=np.double)
>     cdef double val, dr, dc, dcp, drq
>     for r in range(image.shape[0]):
>         dr = r - cr
>         for c in range(image.shape[1]):
>             dc = c - cc
>             val = image[r, c]
>             dcp = 1
>             for p in range(order + 1):
>                 drq = 1
>                 for q in range(order + 1):
>                     mu[p, q] += val * drq * dcp
>                     drq *= dr
>                 dcp *= dc
>     return np.asarray(mu)
>
> ———————————— Broadcasting
> import numpy as np
> import numba
> from functools import reduce
>
> def moments_central(image, center=None, order=3):
>     coords = np.nonzero(image)
>     values = image[coords]
>     coords_stack = np.column_stack(np.nonzero(image))
>     if center is None:
>         center = np.mean(coords_stack, axis=0)
>     ccoords = coords_stack - center
>     powers = np.arange(order + 1)
>     result_shape = (order + 1,) * image.ndim + (coords_stack.shape[0],)
>     coords_pow = []
>     for dim in range(image.ndim):
>         power_shape = [1,] * (image.ndim + 1)
>         power_shape[dim] = order + 1
>         powers = np.reshape(powers, power_shape)
>         ccoords_pow = ccoords[:, dim] ** powers  # SLOW, 50% of time
>         bcast_coords_pow = np.broadcast_to(ccoords_pow, result_shape)
>         coords_pow.append(bcast_coords_pow)
>     result = np.sum(reduce(np.multiply, coords_pow + [values]), axis=-1)
>  # SLOW, 30% of time
>     return result
>
> ———————————— Numba
> def moments_central_nb(image, center=None, order=3):
>     mu = np.zeros((order + 1,) * image.ndim)
>     if center is None:
>         center = np.mean(np.column_stack(np.nonzero(image)), axis=0)
>     _moments_central_nb_loop(image, center, mu)
>     return mu
>
>
> @numba.njit
> def _moments_central_nb_loop(image, center, mu):
>     for coord, val in np.ndenumerate(image):
>         if val > 0:
>             for powers in np.ndindex(mu.shape):
>                 curval = val
>                 for c, ctr, p in zip(coord, center, powers):
>                     dc = c - ctr
>                     cr = 1
>                     for pcount in range(p):
>                         cr *= dc
>                     curval *= cr
>                 mu[powers] += curval
>
>
>
>
> _______________________________________________
> scikit-image mailing list
> scikit-image at python.org
> https://mail.python.org/mailman/listinfo/scikit-image
>
>

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-image/attachments/20170407/bd49e6f2/attachment-0001.html>