Oversplitting by watershed

Mon Nov 12 22:42:44 EST 2012

Thanks for the detailed response!

On Mon, Nov 12, 2012 at 6:57 PM, Josh Warner <silvertrumpet999 at gmail.com>wrote:

> I've looked at these two algorithms, and the biggest difference seems to
> be in the output.  `is_local_maximum` returns a Boolean array, while
> `peak_local_max` returns the indices of points corresponding to maxima (a
> la `np.sort` vs. `np.argsort`, though you cannot just pass the output of
> `peak_local_max` as indices).
>
> The other differences are more subtle, but significant.
>
> The API for `peak_local_max` could use some cleanup - the first threshold
> kwarg is set to 'deprecated'(!) and IMHO should be removed - but this
> algorithm allows finer control over thresholding and peak searching.
>

I think it's marked 'deprecated' so that it can be removed gracefully (i.e.
it gives people time to change their code). That said, it was marked
'deprecated' a few releases back, so it's probably ripe for removal.

> This would be good to avoid finding phantom peaks in noise if large dark
> regions were present. One significant drawback of this algorithm is the
> min_distance kwarg is set by default to 10, rather arbitrary, and ANY input
> (even the minimum value of 1) excludes both neighboring pixels AND the
> border.  See example below.
>
> In contrast, `is_local_maximum` has a much simpler API.  It doesn't have
> the finer thresholding / peak searching controls, but has a unique ability
> to search for peaks ONLY within arbitrary, connected, labeled regions.
>  This has some interesting potentials for masking etc, though I believe
> within each label only one peak will be found.  This algorithm also has the
> ability to search arbitrary local regions for peaks using something akin to
> a morphological structuring element, through the `footprint=` kwarg.  The
> documentation for this could probably be clarified.
>
> The way `peak_local_max` excludes borders concerns me for general use, as
> does its default `min_distance=10`, and personally I would prefer to wok
> around the limitations in `is_local_maximum`.
>
> A best-of-both-worlds combination could probably be created without overly
> much effort...
>

This is a great summary of the API differences. I agree that excluding the
border region is a bit of a wart. (My guess is that this could be fixed by
changing the boundary condition on the maximum filter used in
`peak_local_max`.) Although I agree that defaulting to `min_distance=10` is
arbitrary, I'm not sure there's an obvious choice. I would normally assume
that peaks separated by 1 pixel are just noise.

The footprint and mask parameter would definitely be a nice addition to
`peak_local_max`.

If you have time, a pull request to address some or all of these issues
would be great. If not, maybe you could this as an issue on github?

Thanks,
-Tony

>
>
> Snippet showing border-excluding behavior of `peak_local_max`, which will
> only get worse with higher values of `min_distance`:
>
> import numpy as np
> import matplotlib.pyplot as plt
> from skimage.feature import peak_local_max
> from skimage.morphology import is_local_maximum
>
> # Generate standardized random data
> np.random.seed(seed=1234)
> testim = np.random.randint(0, 255, size=(20, 20))
>
> # Find peaks using both methods
> ismax   = is_local_maximum(testim)                  # Boolean image
> returned
> peakmax = peak_local_max(testim, min_distance=1)    # (M, 2) indices
> returned
>
> # `peakmax` not plottable - placing values in 2d array
> Ipeakmax = np.zeros(testim.shape)
> Ipeakmax[peakmax[:, 0], peakmax[:, 1]] = 1
>
> # Show the results
> fig, ax = plt.subplots(ncols=2, nrows=1)
> ax[0].imshow(ismax, cmap='gray')
> ax[0].set_title('Peaks found by `is_local_maximum`')
> ax[1].imshow(Ipeakmax, cmap='gray')
> ax[1].set_title('Peaks found by `peak_local_max`')
>
> plt.show()
>
>
>
> On Monday, November 12, 2012 3:57:28 PM UTC-6, Tony S Yu wrote:
>
>>
>>
>> On Mon, Nov 12, 2012 at 7:43 AM, Frank <pennek... at googlemail.com> wrote:
>>
>>> Dear group,
>>>
>>> I have some issues with the watershed algorithm implemented in scikits
>>> image. I use a global threshold to segment cells from background, but some
>>> cells touch and I want them to be split. Watershed seems the appropriate
>>> way to deal with my problem, however my particles are split in too many
>>> pieces. Is there a way to adjust the sensitivity of the watershed method?
>>>
>>> Many thanks for any suggestion!
>>>
>>> The code that I use looks like below. An example image that I want to
>>> process can be downloaded here: https://dl.dropbox.com/u/**
>>> 10373933/test.jpg <https://dl.dropbox.com/u/10373933/test.jpg>
>>>
>>> # packages needed to perform image processing and analysis
>>> import numpy as np
>>> import scipy as scp
>>> import matplotlib.pyplot as plt
>>> import matplotlib.image as mpimg
>>> import scipy.ndimage as nd
>>> import skimage
>>> from skimage import io
>>> from skimage.morphology import watershed, is_local_maximum
>>> from skimage.segmentation import find_boundaries, visualize_boundaries
>>> from skimage.color import gray2rgb
>>>
>>> #read files jpeg file
>>> image = mpimg.imread('c:\\test.jpg')
>>> image_thresh = image > 140
>>> labels = nd.label(image_thresh)[0]
>>> distance = nd.distance_transform_edt(**image_thresh)
>>> local_maxi = is_local_maximum(distance, labels=labels,
>>> footprint=np.ones((9, 9)))
>>> markers = nd.label(local_maxi)[0]
>>> labelled_image = watershed(-distance, markers, mask=image_thresh)
>>>
>>> #find outline of objects for plotting
>>> boundaries = find_boundaries(labelled_**image)
>>> img_rgb = gray2rgb(image)
>>> overlay = np.flipud(visualize_**boundaries(img_rgb,boundaries)**)
>>> imshow(overlay)
>>
>>
>> Hi Frank,
>>
>> Actually, I don't think the issue is in the watershed segmentation.
>> Instead, I think the problem is in the marker specification: Using local
>> maxima creates too many marker points when a blob deviates greatly from a
>> circle. (BTW, does anyone know if there are any differences between
>> `is_local_maximum` and `peak_local_max`? Maybe the former should be
>> deprecated.)
>>
>> Using the centroids of blobs gives cleaner results. See slightly-modified
>> example below.
>>
>> Best,
>> -Tony
>>
>> # packages needed to perform image processing and analysis
>> import numpy as np
>> import matplotlib.pyplot as plt
>> import scipy.ndimage as nd
>>
>> from skimage import io
>> from skimage import measure
>> from skimage.morphology import watershed
>> from skimage.segmentation import find_boundaries, visualize_boundaries
>> from skimage.color import gray2rgb
>>
>> #read files jpeg file
>> image = io.imread('test.jpg')
>>
>> image_thresh = image > 140
>> labels = nd.label(image_thresh)[0]
>> distance = nd.distance_transform_edt(**image_thresh)
>>
>> props = measure.regionprops(labels, ['Centroid'])
>> coords = np.array([np.round(p['**Centroid']) for p in props], dtype=int)
>> # Create marker image where blob centroids are marked True
>> markers = np.zeros(image.shape, dtype=bool)
>> markers[tuple(np.transpose(**coords))] = True
>>
>> labelled_image = watershed(-distance, markers, mask=image_thresh)
>>
>> #find outline of objects for plotting
>> boundaries = find_boundaries(labelled_**image)
>> img_rgb = gray2rgb(image)
>> overlay = visualize_boundaries(img_rgb, boundaries, color=(1, 0, 0))
>>
>> plt.imshow(overlay)
>> plt.show()
>>
>  --
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-image/attachments/20121112/dfc51ba4/attachment.html>