Re: [scikit-image] Help with making moments nD

7 Apr 2017

      Hi Juan,

The transposition is almost certainly due to the fact that in the Cython
implementation the p loop, which loops rows of the moments array, uses the
dc multiplier, which comes from the image columns, while the q loop, which
loops columns of the moments array, uses the dr multiplier, which comes
from the image rows. Your other code chooses the multiplier corresponding
to the same axes in both the image and the moments arrays. It is not
obvious to me how to extend the current behavior to higher dimensions.

As for your implementation, using np.nonzero is probably not going to pay
off unless your image is very sparse, because you basically go from having
to handle a single value per image pixel, to add an extra coordinate value
for each dimension. And you don't really need to compute an offset for each
pixel, since many pixels share the offsets along some dimensions, e.g. for
a 100 x 100 x 100 3D image, this code is 2-3x faster:

import itertools

def moments_central(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  # Reshape each delta so that it can properly broadcast against image.
  deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim -
1))
            for dim, delta in enumerate(deltas)]
  moments = np.empty((order + 1,) * image.ndim, dtype=float)
  reusable_buffer = np.empty_like(image, dtype=float)
  for powers in itertools.product(range(order + 1), repeat=image.ndim):
    reusable_buffer[:] = image
    for power, delta in zip(powers, deltas):
      reusable_buffer *= delta ** power
    moments[tuple(powers)] = reusable_buffer.sum()

  return moments

Notice that this just iterates over the moments items. Broadcasting in this
type of problems is dangerous, because you basically replace an N item loop
with creating an arrays N times larger than your original data. And Python
loops are slow, but not THAT slow. You can actually make the above idea run
even faster, almost 2x, by reducing the dimensionality of your result as
soon as possible earlier:

import itertools

def moments_central_faster(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  # Reshape each delta so that it can properly broadcast against image.
  deltas = [delta.reshape((1,) * dim + (-1,) + (1,) * (image.ndim - dim -
1))
            for dim, delta in enumerate(deltas)]
  moments = np.empty((order + 1,) * image.ndim, dtype=float)
  for powers in itertools.product(range(order + 1), repeat=image.ndim):
    calc = image.copy()
    for dim, (power, delta) in enumerate(zip(powers, deltas)):
      calc *= delta ** power
      calc = np.add.reduce(calc, axis=dim, keepdims=True)
    moments[tuple(powers)] = calc.sum()

  return moments

This idea of reducing dimensionality can be given a twist in the form of
stacked matrix multiplication: one can think of the whole calculation as
reducing each dimension of size D to O, where O is the order plus one, by
multiplying by a DXO matrix, holding the powers of the deltas along that
dimension. In code it could look something like this:

def moments_central_even_faster(image, center=None, order=3):
  image = np.asarray(image)
  if center is None:
    center = np.array(image.shape, dtype=float) / 2
  center = np.asarray(center)
  assert center.ndim == 1 and center.size == image.ndim
  # deltas is a list of arrays holding the deltas from the center along each
  # dimension of the image.
  deltas = [np.arange(image.shape[dim], dtype=float) - center[dim]
            for dim in range(image.ndim)]
  calc = image
  for dim, delta in enumerate(deltas):
    powers_of_delta = delta[:, None] ** np.arange(order + 1)
    calc = np.rollaxis(calc, dim, image.ndim)
    calc = np.dot(calc, powers_of_delta)
    calc = np.rollaxis(calc, -1, dim)

  return calc

This turns out to be ridiculously fast for the 100 x 100 x 100 image case
(1.41 s vs 520 ms vs 258 ms vs 10.5 ms), so over 100x (!!!) faster than
your implementation.

On a 1000 x 1000 2D image it's fast, but not the fastest (571 ms vs 122 ms
vs 92.8 ms vs 113 ms), but the fastest version is probably in the same
ballpark as the current Cython implementation.

Let me know how do your timings compare.

Jaime

On Fri, Apr 7, 2017 at 9:16 AM, Juan Nunez-Iglesias <jni.soma@gmail.com>
wrote:
...
Hi everyone,
I’ve started work on getting image moments working in nD. I have two
working alternative implementations: one with NumPy broadcasting and one
with Numba JIT. Both are consistent with each other and with the existing
Cython implementation, with to caveats:
1- my results are the *transpose* of the reference results. I suspected
this might be because the reference uses x/y coordinates but that doesn’t
seem to be the case. (see code below).
2- each of my implementations is ~10x slower than the Cython
implementation.
Of these, problem 1 is the most pressing. (Problem 2 can be solved by
special-casing 2D.) Can someone familiar with moments compare my
implementations and the reference and tell me where I might have gone
wrong? I’ve included all the code below for reference.
Ideas about speedups are also welcome! To that end, I’ve marked the slow
lines in the broadcasting code. (I don’t know how to line profile in Numba;
I think it’s currently impossible, in fact.)
Juan.
*————————— Reference results*
———————————— Values
(Notice the .T for `moments_central`, which is the existing Cython
implementation)
In [189]: m.moments_central_nb(horse.astype(float), central_coords, 3)
Out[189]:
array([[  8.779e+04,  -1.621e-07,   1.301e+09,  -6.370e+09],
       [ -8.245e-07,   9.275e+07,  -3.430e+09,   1.938e+12],
       [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
       [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])
In [190]: moments_central(horse.astype(float), *central_coords, 3).T
Out[190]:
array([[  8.779e+04,  -1.621e-07,   1.301e+09,  -6.370e+09],
       [ -8.245e-07,   9.275e+07,  -3.430e+09,   1.938e+12],
       [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
       [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])
In [191]: m.moments_central(horse.astype(float), central_coords, 3)
Out[191]:
array([[  8.779e+04,  -1.037e-09,   1.301e+09,  -6.370e+09],
       [ -1.979e-09,   9.275e+07,  -3.430e+09,   1.938e+12],
       [  9.873e+08,  -1.363e+10,   1.349e+13,  -3.604e+14],
       [ -2.218e+10,   1.552e+12,  -2.849e+14,   3.245e+16]])
———————————— Timings
In [185]: %timeit moments_central(horse.astype(float), *central_coords, 3)
100 loops, best of 3: 3.14 ms per loop
In [187]: %timeit m.moments_central_nb(horse.astype(float),
central_coords, 3)
10 loops, best of 3: 35.7 ms per loop
In [188]: %timeit m.moments_central(horse.astype(float), central_coords,
3)
10 loops, best of 3: 39.4 ms per loop
*————————— Implementations*
———————————— Existing Cython
cpdef moments_central(image_t image, double cr, double cc, Py_ssize_t
order):
    cdef Py_ssize_t p, q, r, c
    cdef double[:, ::1] mu = np.zeros((order + 1, order + 1),
dtype=np.double)
    cdef double val, dr, dc, dcp, drq
    for r in range(image.shape[0]):
        dr = r - cr
        for c in range(image.shape[1]):
            dc = c - cc
            val = image[r, c]
            dcp = 1
            for p in range(order + 1):
                drq = 1
                for q in range(order + 1):
                    mu[p, q] += val * drq * dcp
                    drq *= dr
                dcp *= dc
    return np.asarray(mu)
———————————— Broadcasting
import numpy as np
import numba
from functools import reduce
def moments_central(image, center=None, order=3):
    coords = np.nonzero(image)
    values = image[coords]
    coords_stack = np.column_stack(np.nonzero(image))
    if center is None:
        center = np.mean(coords_stack, axis=0)
    ccoords = coords_stack - center
    powers = np.arange(order + 1)
    result_shape = (order + 1,) * image.ndim + (coords_stack.shape[0],)
    coords_pow = []
    for dim in range(image.ndim):
        power_shape = [1,] * (image.ndim + 1)
        power_shape[dim] = order + 1
        powers = np.reshape(powers, power_shape)
        ccoords_pow = ccoords[:, dim] ** powers  # SLOW, 50% of time
        bcast_coords_pow = np.broadcast_to(ccoords_pow, result_shape)
        coords_pow.append(bcast_coords_pow)
    result = np.sum(reduce(np.multiply, coords_pow + [values]), axis=-1)
 # SLOW, 30% of time
    return result
———————————— Numba
def moments_central_nb(image, center=None, order=3):
    mu = np.zeros((order + 1,) * image.ndim)
    if center is None:
        center = np.mean(np.column_stack(np.nonzero(image)), axis=0)
    _moments_central_nb_loop(image, center, mu)
    return mu
@numba.njit
def _moments_central_nb_loop(image, center, mu):
    for coord, val in np.ndenumerate(image):
        if val > 0:
            for powers in np.ndindex(mu.shape):
                curval = val
                for c, ctr, p in zip(coord, center, powers):
                    dc = c - ctr
                    cr = 1
                    for pcount in range(p):
                        cr *= dc
                    curval *= cr
                mu[powers] += curval
_______________________________________________
scikit-image mailing list
scikit-image@python.org
https://mail.python.org/mailman/listinfo/scikit-image
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.

Re: [scikit-image] Help with making moments nD

Jaime Fernández del Río