Hi Juan,

The transposition is almost certainly due to the fact that in the Cython implementation the p loop, which loops rows of the moments array, uses the dc multiplier, which comes from the image columns, while the q loop, which loops columns of the moments array, uses the dr multiplier, which comes from the image rows. Your other code chooses the multiplier corresponding to the same axes in both the image and the moments arrays. It is not obvious to me how to extend the current behavior to higher dimensions.

As for your implementation, using np.nonzero is probably not going to pay off unless your image is very sparse, because you basically go from having to handle a single value per image pixel, to add an extra coordinate value for each dimension. And you don't really need to compute an offset for each pixel, since many pixels share the offsets along some dimensions, e.g. for a 100 x 100 x 100 3D image, this code is 2-3x faster:

import itertools

def moments_central(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  # Reshape each delta so that it can properly broadcast against image.
  deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim - 1))
            for dim, delta in enumerate(deltas)]
  moments = np.empty((order + 1,) * image.ndim, dtype=float)
  reusable_buffer = np.empty_like(image, dtype=float)
  for powers in itertools.product(range(order + 1), repeat=image.ndim):
    reusable_buffer[:] = image
    for power, delta in zip(powers, deltas):
      reusable_buffer *= delta ** power
    moments[tuple(powers)] = reusable_buffer.sum()

  return moments

Notice that this just iterates over the moments items. Broadcasting in this type of problems is dangerous, because you basically replace an N item loop with creating an arrays N times larger than your original data. And Python loops are slow, but not THAT slow. You can actually make the above idea run even faster, almost 2x, by reducing the dimensionality of your result as soon as possible earlier:

import itertools

def moments_central_faster(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  # Reshape each delta so that it can properly broadcast against image.
  deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim - 1))
            for dim, delta in enumerate(deltas)]
  moments = np.empty((order + 1,) * image.ndim, dtype=float)
  for powers in itertools.product(range(order + 1), repeat=image.ndim):
    calc = image.copy()
    for dim, (power, delta) in enumerate(zip(powers, deltas)):
      calc *= delta ** power
      calc = np.add.reduce(calc, axis=dim, keepdims=True)
    moments[tuple(powers)] = calc.sum()

  return moments

This idea of reducing dimensionality can be given a twist in the form of stacked matrix multiplication: one can think of the whole calculation as reducing each dimension of size D to O, where O is the order plus one, by multiplying by a DXO matrix, holding the powers of the deltas along that dimension. In code it could look something like this:

def moments_central_even_faster(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  calc = image
  for dim, delta in enumerate(deltas):
    powers_of_delta = delta[:, None] ** np.arange(order + 1)
    calc = np.rollaxis(calc, dim, image.ndim)
    calc = np.dot(calc, powers_of_delta)
    calc = np.rollaxis(calc, -1, dim)
  
  return calc

This turns out to be ridiculously fast for the 100 x 100 x 100 image case (1.41 s vs 520 ms vs 258 ms vs 10.5 ms), so over 100x (!!!) faster than your implementation.

On a 1000 x 1000 2D image it's fast, but not the fastest (571 ms vs 122 ms vs 92.8 ms vs 113 ms), but the fastest version is probably in the same ballpark as the current Cython implementation.

Let me know how do your timings compare.

Jaime



On Fri, Apr 7, 2017 at 9:16 AM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Hi everyone,

I’ve started work on getting image moments working in nD. I have two working alternative implementations: one with NumPy broadcasting and one with Numba JIT. Both are consistent with each other and with the existing Cython implementation, with to caveats:

1- my results are the *transpose* of the reference results. I suspected this might be because the reference uses x/y coordinates but that doesn’t seem to be the case. (see code below).

2- each of my implementations is ~10x slower than the Cython implementation.

Of these, problem 1 is the most pressing. (Problem 2 can be solved by special-casing 2D.) Can someone familiar with moments compare my implementations and the reference and tell me where I might have gone wrong? I’ve included all the code below for reference.

Ideas about speedups are also welcome! To that end, I’ve marked the slow lines in the broadcasting code. (I don’t know how to line profile in Numba; I think it’s currently impossible, in fact.)

Juan.

————————— Reference results

———————————— Values

(Notice the .T for `moments_central`, which is the existing Cython implementation)

In [189]: m.moments_central_nb(horse.astype(float), central_coords, 3)
Out[189]:
array([[  8.779e+04,  -1.621e-07,   1.301e+09,  -6.370e+09],
       [ -8.245e-07,   9.275e+07,  -3.430e+09,   1.938e+12],
       [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
       [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])

In [190]: moments_central(horse.astype(float), *central_coords, 3).T
Out[190]:
array([[  8.779e+04,  -1.621e-07,   1.301e+09,  -6.370e+09],
       [ -8.245e-07,   9.275e+07,  -3.430e+09,   1.938e+12],
       [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
       [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])

In [191]: m.moments_central(horse.astype(float), central_coords, 3)
Out[191]:
array([[  8.779e+04,  -1.037e-09,   1.301e+09,  -6.370e+09],
       [ -1.979e-09,   9.275e+07,  -3.430e+09,   1.938e+12],
       [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
       [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])

———————————— Timings

In [185]: %timeit moments_central(horse.astype(float), *central_coords, 3)
100 loops, best of 3: 3.14 ms per loop

In [187]: %timeit m.moments_central_nb(horse.astype(float), central_coords, 3)
10 loops, best of 3: 35.7 ms per loop

In [188]: %timeit m.moments_central(horse.astype(float), central_coords, 3)
10 loops, best of 3: 39.4 ms per loop

————————— Implementations

———————————— Existing Cython
cpdef moments_central(image_t image, double cr, double cc, Py_ssize_t order):
    cdef Py_ssize_t p, q, r, c
    cdef double[:, ::1] mu = np.zeros((order + 1, order + 1), dtype=np.double)
    cdef double val, dr, dc, dcp, drq
    for r in range(image.shape[0]):
        dr = r - cr
        for c in range(image.shape[1]):
            dc = c - cc
            val = image[r, c]
            dcp = 1
            for p in range(order + 1):
                drq = 1
                for q in range(order + 1):
                    mu[p, q] += val * drq * dcp
                    drq *= dr
                dcp *= dc
    return np.asarray(mu)

———————————— Broadcasting
import numpy as np
import numba
from functools import reduce

def moments_central(image, center=None, order=3):
    coords = np.nonzero(image)
    values = image[coords]
    coords_stack = np.column_stack(np.nonzero(image))
    if center is None:
        center = np.mean(coords_stack, axis=0)
    ccoords = coords_stack - center
    powers = np.arange(order + 1)
    result_shape = (order + 1,) * image.ndim + (coords_stack.shape[0],)
    coords_pow = []
    for dim in range(image.ndim):
        power_shape = [1,] * (image.ndim + 1)
        power_shape[dim] = order + 1
        powers = np.reshape(powers, power_shape)
        ccoords_pow = ccoords[:, dim] ** powers  # SLOW, 50% of time
        bcast_coords_pow = np.broadcast_to(ccoords_pow, result_shape)
        coords_pow.append(bcast_coords_pow)
    result = np.sum(reduce(np.multiply, coords_pow + [values]), axis=-1)  # SLOW, 30% of time
    return result

———————————— Numba
def moments_central_nb(image, center=None, order=3):
    mu = np.zeros((order + 1,) * image.ndim)
    if center is None:
        center = np.mean(np.column_stack(np.nonzero(image)), axis=0)
    _moments_central_nb_loop(image, center, mu)
    return mu


@numba.njit
def _moments_central_nb_loop(image, center, mu):
    for coord, val in np.ndenumerate(image):
        if val > 0:
            for powers in np.ndindex(mu.shape):
                curval = val
                for c, ctr, p in zip(coord, center, powers):
                    dc = c - ctr
                    cr = 1
                    for pcount in range(p):
                        cr *= dc
                    curval *= cr
                mu[powers] += curval




_______________________________________________
scikit-image mailing list
scikit-image@python.org
https://mail.python.org/mailman/listinfo/scikit-image




--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.