New subject: Oversplitting by watershed

12 Nov 2012

      I've looked at these two algorithms, and the biggest difference seems to be 
in the output.  `is_local_maximum` returns a Boolean array, while 
`peak_local_max` returns the indices of points corresponding to maxima (a 
la `np.sort` vs. `np.argsort`, though you cannot just pass the output of 
`peak_local_max` as indices). 

The other differences are more subtle, but significant.

The API for `peak_local_max` could use some cleanup - the first threshold 
kwarg is set to 'deprecated'(!) and IMHO should be removed - but this 
algorithm allows finer control over thresholding and peak searching. This 
would be good to avoid finding phantom peaks in noise if large dark regions 
were present. One significant drawback of this algorithm is the 
min_distance kwarg is set by default to 10, rather arbitrary, and ANY input 
(even the minimum value of 1) excludes both neighboring pixels AND the 
border.  See example below.

In contrast, `is_local_maximum` has a much simpler API.  It doesn't have 
the finer thresholding / peak searching controls, but has a unique ability 
to search for peaks ONLY within arbitrary, connected, labeled regions. 
 This has some interesting potentials for masking etc, though I believe 
within each label only one peak will be found.  This algorithm also has the 
ability to search arbitrary local regions for peaks using something akin to 
a morphological structuring element, through the `footprint=` kwarg.  The 
documentation for this could probably be clarified.  

The way `peak_local_max` excludes borders concerns me for general use, as 
does its default `min_distance=10`, and personally I would prefer to wok 
around the limitations in `is_local_maximum`.  

A best-of-both-worlds combination could probably be created without overly 
much effort...

Snippet showing border-excluding behavior of `peak_local_max`, which will 
only get worse with higher values of `min_distance`:

import numpy as np
import matplotlib.pyplot as plt
from skimage.feature import peak_local_max
from skimage.morphology import is_local_maximum

# Generate standardized random data
np.random.seed(seed=1234)
testim = np.random.randint(0, 255, size=(20, 20))

# Find peaks using both methods
ismax   = is_local_maximum(testim)                  # Boolean image returned
peakmax = peak_local_max(testim, min_distance=1)    # (M, 2) indices 
returned

# `peakmax` not plottable - placing values in 2d array
Ipeakmax = np.zeros(testim.shape)
Ipeakmax[peakmax[:, 0], peakmax[:, 1]] = 1

# Show the results
fig, ax = plt.subplots(ncols=2, nrows=1)
ax[0].imshow(ismax, cmap='gray')
ax[0].set_title('Peaks found by `is_local_maximum`')
ax[1].imshow(Ipeakmax, cmap='gray')
ax[1].set_title('Peaks found by `peak_local_max`')

plt.show()

On Monday, November 12, 2012 3:57:28 PM UTC-6, Tony S Yu wrote:
...
On Mon, Nov 12, 2012 at 7:43 AM, Frank <pennek...@googlemail.com<javascript:>
...
wrote:
...
Dear group,
I have some issues with the watershed algorithm implemented in scikits 
image. I use a global threshold to segment cells from background, but some 
cells touch and I want them to be split. Watershed seems the appropriate 
way to deal with my problem, however my particles are split in too many 
pieces. Is there a way to adjust the sensitivity of the watershed method?
Many thanks for any suggestion!
The code that I use looks like below. An example image that I want to 
process can be downloaded here: 
https://dl.dropbox.com/u/10373933/test.jpg
# packages needed to perform image processing and analysis
import numpy as np
import scipy as scp
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import scipy.ndimage as nd
import skimage
from skimage import io
from skimage.morphology import watershed, is_local_maximum
from skimage.segmentation import find_boundaries, visualize_boundaries
from skimage.color import gray2rgb
#read files jpeg file
image = mpimg.imread('c:\\test.jpg')
image_thresh = image > 140
labels = nd.label(image_thresh)[0]
distance = nd.distance_transform_edt(image_thresh)
local_maxi = is_local_maximum(distance, labels=labels, 
footprint=np.ones((9, 9)))
markers = nd.label(local_maxi)[0]
labelled_image = watershed(-distance, markers, mask=image_thresh)
#find outline of objects for plotting
boundaries = find_boundaries(labelled_image)
img_rgb = gray2rgb(image)
overlay = np.flipud(visualize_boundaries(img_rgb,boundaries))
imshow(overlay)
Hi Frank,
Actually, I don't think the issue is in the watershed segmentation. 
Instead, I think the problem is in the marker specification: Using local 
maxima creates too many marker points when a blob deviates greatly from a 
circle. (BTW, does anyone know if there are any differences between 
`is_local_maximum` and `peak_local_max`? Maybe the former should be 
deprecated.)
Using the centroids of blobs gives cleaner results. See slightly-modified 
example below.
Best,
-Tony
# packages needed to perform image processing and analysis
import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage as nd
from skimage import io
from skimage import measure
from skimage.morphology import watershed
from skimage.segmentation import find_boundaries, visualize_boundaries
from skimage.color import gray2rgb
#read files jpeg file
image = io.imread('test.jpg')
image_thresh = image > 140
labels = nd.label(image_thresh)[0]
distance = nd.distance_transform_edt(image_thresh)
props = measure.regionprops(labels, ['Centroid'])
coords = np.array([np.round(p['Centroid']) for p in props], dtype=int)
# Create marker image where blob centroids are marked True
markers = np.zeros(image.shape, dtype=bool)
markers[tuple(np.transpose(coords))] = True
labelled_image = watershed(-distance, markers, mask=image_thresh)
#find outline of objects for plotting
boundaries = find_boundaries(labelled_image)
img_rgb = gray2rgb(image)
overlay = visualize_boundaries(img_rgb, boundaries, color=(1, 0, 0))
plt.imshow(overlay)
plt.show()

Re: Oversplitting by watershed

Josh Warner

Tony Yu

Stéfan van der Walt

tags

participants (3)