Re: Oversplitting by watershed
I've looked at these two algorithms, and the biggest difference seems to be in the output. `is_local_maximum` returns a Boolean array, while `peak_local_max` returns the indices of points corresponding to maxima (a la `np.sort` vs. `np.argsort`, though you cannot just pass the output of `peak_local_max` as indices). The other differences are more subtle, but significant. The API for `peak_local_max` could use some cleanup - the first threshold kwarg is set to 'deprecated'(!) and IMHO should be removed - but this algorithm allows finer control over thresholding and peak searching. This would be good to avoid finding phantom peaks in noise if large dark regions were present. One significant drawback of this algorithm is the min_distance kwarg is set by default to 10, rather arbitrary, and ANY input (even the minimum value of 1) excludes both neighboring pixels AND the border. See example below. In contrast, `is_local_maximum` has a much simpler API. It doesn't have the finer thresholding / peak searching controls, but has a unique ability to search for peaks ONLY within arbitrary, connected, labeled regions. This has some interesting potentials for masking etc, though I believe within each label only one peak will be found. This algorithm also has the ability to search arbitrary local regions for peaks using something akin to a morphological structuring element, through the `footprint=` kwarg. The documentation for this could probably be clarified. The way `peak_local_max` excludes borders concerns me for general use, as does its default `min_distance=10`, and personally I would prefer to wok around the limitations in `is_local_maximum`. A best-of-both-worlds combination could probably be created without overly much effort... Snippet showing border-excluding behavior of `peak_local_max`, which will only get worse with higher values of `min_distance`: import numpy as np import matplotlib.pyplot as plt from skimage.feature import peak_local_max from skimage.morphology import is_local_maximum # Generate standardized random data np.random.seed(seed=1234) testim = np.random.randint(0, 255, size=(20, 20)) # Find peaks using both methods ismax = is_local_maximum(testim) # Boolean image returned peakmax = peak_local_max(testim, min_distance=1) # (M, 2) indices returned # `peakmax` not plottable - placing values in 2d array Ipeakmax = np.zeros(testim.shape) Ipeakmax[peakmax[:, 0], peakmax[:, 1]] = 1 # Show the results fig, ax = plt.subplots(ncols=2, nrows=1) ax[0].imshow(ismax, cmap='gray') ax[0].set_title('Peaks found by `is_local_maximum`') ax[1].imshow(Ipeakmax, cmap='gray') ax[1].set_title('Peaks found by `peak_local_max`') plt.show() On Monday, November 12, 2012 3:57:28 PM UTC-6, Tony S Yu wrote:
On Mon, Nov 12, 2012 at 7:43 AM, Frank <pennek...@googlemail.com<javascript:>
wrote:
Dear group,
I have some issues with the watershed algorithm implemented in scikits image. I use a global threshold to segment cells from background, but some cells touch and I want them to be split. Watershed seems the appropriate way to deal with my problem, however my particles are split in too many pieces. Is there a way to adjust the sensitivity of the watershed method?
Many thanks for any suggestion!
The code that I use looks like below. An example image that I want to process can be downloaded here: https://dl.dropbox.com/u/10373933/test.jpg
# packages needed to perform image processing and analysis import numpy as np import scipy as scp import matplotlib.pyplot as plt import matplotlib.image as mpimg import scipy.ndimage as nd import skimage from skimage import io from skimage.morphology import watershed, is_local_maximum from skimage.segmentation import find_boundaries, visualize_boundaries from skimage.color import gray2rgb
#read files jpeg file image = mpimg.imread('c:\\test.jpg') image_thresh = image > 140 labels = nd.label(image_thresh)[0] distance = nd.distance_transform_edt(image_thresh) local_maxi = is_local_maximum(distance, labels=labels, footprint=np.ones((9, 9))) markers = nd.label(local_maxi)[0] labelled_image = watershed(-distance, markers, mask=image_thresh)
#find outline of objects for plotting boundaries = find_boundaries(labelled_image) img_rgb = gray2rgb(image) overlay = np.flipud(visualize_boundaries(img_rgb,boundaries)) imshow(overlay)
Hi Frank,
Actually, I don't think the issue is in the watershed segmentation. Instead, I think the problem is in the marker specification: Using local maxima creates too many marker points when a blob deviates greatly from a circle. (BTW, does anyone know if there are any differences between `is_local_maximum` and `peak_local_max`? Maybe the former should be deprecated.)
Using the centroids of blobs gives cleaner results. See slightly-modified example below.
Best, -Tony
# packages needed to perform image processing and analysis import numpy as np import matplotlib.pyplot as plt import scipy.ndimage as nd
from skimage import io from skimage import measure from skimage.morphology import watershed from skimage.segmentation import find_boundaries, visualize_boundaries from skimage.color import gray2rgb
#read files jpeg file image = io.imread('test.jpg')
image_thresh = image > 140 labels = nd.label(image_thresh)[0] distance = nd.distance_transform_edt(image_thresh)
props = measure.regionprops(labels, ['Centroid']) coords = np.array([np.round(p['Centroid']) for p in props], dtype=int) # Create marker image where blob centroids are marked True markers = np.zeros(image.shape, dtype=bool) markers[tuple(np.transpose(coords))] = True
labelled_image = watershed(-distance, markers, mask=image_thresh)
#find outline of objects for plotting boundaries = find_boundaries(labelled_image) img_rgb = gray2rgb(image) overlay = visualize_boundaries(img_rgb, boundaries, color=(1, 0, 0))
plt.imshow(overlay) plt.show()
Thanks for the detailed response! On Mon, Nov 12, 2012 at 6:57 PM, Josh Warner <silvertrumpet999@gmail.com>wrote:
I've looked at these two algorithms, and the biggest difference seems to be in the output. `is_local_maximum` returns a Boolean array, while `peak_local_max` returns the indices of points corresponding to maxima (a la `np.sort` vs. `np.argsort`, though you cannot just pass the output of `peak_local_max` as indices).
The other differences are more subtle, but significant.
The API for `peak_local_max` could use some cleanup - the first threshold kwarg is set to 'deprecated'(!) and IMHO should be removed - but this algorithm allows finer control over thresholding and peak searching.
I think it's marked 'deprecated' so that it can be removed gracefully (i.e. it gives people time to change their code). That said, it was marked 'deprecated' a few releases back, so it's probably ripe for removal.
This would be good to avoid finding phantom peaks in noise if large dark regions were present. One significant drawback of this algorithm is the min_distance kwarg is set by default to 10, rather arbitrary, and ANY input (even the minimum value of 1) excludes both neighboring pixels AND the border. See example below.
In contrast, `is_local_maximum` has a much simpler API. It doesn't have the finer thresholding / peak searching controls, but has a unique ability to search for peaks ONLY within arbitrary, connected, labeled regions. This has some interesting potentials for masking etc, though I believe within each label only one peak will be found. This algorithm also has the ability to search arbitrary local regions for peaks using something akin to a morphological structuring element, through the `footprint=` kwarg. The documentation for this could probably be clarified.
The way `peak_local_max` excludes borders concerns me for general use, as does its default `min_distance=10`, and personally I would prefer to wok around the limitations in `is_local_maximum`.
A best-of-both-worlds combination could probably be created without overly much effort...
This is a great summary of the API differences. I agree that excluding the border region is a bit of a wart. (My guess is that this could be fixed by changing the boundary condition on the maximum filter used in `peak_local_max`.) Although I agree that defaulting to `min_distance=10` is arbitrary, I'm not sure there's an obvious choice. I would normally assume that peaks separated by 1 pixel are just noise. The footprint and mask parameter would definitely be a nice addition to `peak_local_max`. If you have time, a pull request to address some or all of these issues would be great. If not, maybe you could this as an issue on github? Thanks, -Tony
Snippet showing border-excluding behavior of `peak_local_max`, which will only get worse with higher values of `min_distance`:
import numpy as np import matplotlib.pyplot as plt from skimage.feature import peak_local_max from skimage.morphology import is_local_maximum
# Generate standardized random data np.random.seed(seed=1234) testim = np.random.randint(0, 255, size=(20, 20))
# Find peaks using both methods ismax = is_local_maximum(testim) # Boolean image returned peakmax = peak_local_max(testim, min_distance=1) # (M, 2) indices returned
# `peakmax` not plottable - placing values in 2d array Ipeakmax = np.zeros(testim.shape) Ipeakmax[peakmax[:, 0], peakmax[:, 1]] = 1
# Show the results fig, ax = plt.subplots(ncols=2, nrows=1) ax[0].imshow(ismax, cmap='gray') ax[0].set_title('Peaks found by `is_local_maximum`') ax[1].imshow(Ipeakmax, cmap='gray') ax[1].set_title('Peaks found by `peak_local_max`')
plt.show()
On Monday, November 12, 2012 3:57:28 PM UTC-6, Tony S Yu wrote:
On Mon, Nov 12, 2012 at 7:43 AM, Frank <pennek...@googlemail.com> wrote:
Dear group,
I have some issues with the watershed algorithm implemented in scikits image. I use a global threshold to segment cells from background, but some cells touch and I want them to be split. Watershed seems the appropriate way to deal with my problem, however my particles are split in too many pieces. Is there a way to adjust the sensitivity of the watershed method?
Many thanks for any suggestion!
The code that I use looks like below. An example image that I want to process can be downloaded here: https://dl.dropbox.com/u/** 10373933/test.jpg <https://dl.dropbox.com/u/10373933/test.jpg>
# packages needed to perform image processing and analysis import numpy as np import scipy as scp import matplotlib.pyplot as plt import matplotlib.image as mpimg import scipy.ndimage as nd import skimage from skimage import io from skimage.morphology import watershed, is_local_maximum from skimage.segmentation import find_boundaries, visualize_boundaries from skimage.color import gray2rgb
#read files jpeg file image = mpimg.imread('c:\\test.jpg') image_thresh = image > 140 labels = nd.label(image_thresh)[0] distance = nd.distance_transform_edt(**image_thresh) local_maxi = is_local_maximum(distance, labels=labels, footprint=np.ones((9, 9))) markers = nd.label(local_maxi)[0] labelled_image = watershed(-distance, markers, mask=image_thresh)
#find outline of objects for plotting boundaries = find_boundaries(labelled_**image) img_rgb = gray2rgb(image) overlay = np.flipud(visualize_**boundaries(img_rgb,boundaries)**) imshow(overlay)
Hi Frank,
Actually, I don't think the issue is in the watershed segmentation. Instead, I think the problem is in the marker specification: Using local maxima creates too many marker points when a blob deviates greatly from a circle. (BTW, does anyone know if there are any differences between `is_local_maximum` and `peak_local_max`? Maybe the former should be deprecated.)
Using the centroids of blobs gives cleaner results. See slightly-modified example below.
Best, -Tony
# packages needed to perform image processing and analysis import numpy as np import matplotlib.pyplot as plt import scipy.ndimage as nd
from skimage import io from skimage import measure from skimage.morphology import watershed from skimage.segmentation import find_boundaries, visualize_boundaries from skimage.color import gray2rgb
#read files jpeg file image = io.imread('test.jpg')
image_thresh = image > 140 labels = nd.label(image_thresh)[0] distance = nd.distance_transform_edt(**image_thresh)
props = measure.regionprops(labels, ['Centroid']) coords = np.array([np.round(p['**Centroid']) for p in props], dtype=int) # Create marker image where blob centroids are marked True markers = np.zeros(image.shape, dtype=bool) markers[tuple(np.transpose(**coords))] = True
labelled_image = watershed(-distance, markers, mask=image_thresh)
#find outline of objects for plotting boundaries = find_boundaries(labelled_**image) img_rgb = gray2rgb(image) overlay = visualize_boundaries(img_rgb, boundaries, color=(1, 0, 0))
plt.imshow(overlay) plt.show()
--
On Mon, Nov 12, 2012 at 3:57 PM, Josh Warner <silvertrumpet999@gmail.com> wrote:
In contrast, `is_local_maximum` has a much simpler API. It doesn't have the
Just for the record, `is_local_maximum` is mentioned in: http://shop.oreilly.com/product/0636920020219.do So, if we could write a unified backend but still expose this function, that'd be good! Stéfan
participants (3)
-
Josh Warner
-
Stéfan van der Walt
-
Tony Yu