random walker segmentation / blob detection

Wed Feb 8 08:46:53 EST 2012

Hi Michael,                                                                     

sorry for not being very reactive! For the random walker algorithm,
markers basically have to be pixels for which you are sure that they
belong to one phase (say, the phase blobs or the phase "normal
ground"). Most of the time markers are determined from the histogram
(e.g., taking the 10% darker and lighter pixels), but other features than
gray values can be use to determine whether you can label your markers or
not (such as local variance, Haralick features -- this is in fact a
classification task).

Of course, this implies that you know the number of phases beforehand,
that is, if you know that you have blobs, then you can start thinking of
determining markers and using the random walker. But this won't tell you
whether you have blobs or not.

For your blobs image, when I plot the histogram I can see that it is
bimodal, with two peaks. The first peak corresponds mostly to the blobs
pixels, while the second one corresponds to the normal ground. What I
would do would be to detect a local minimum of the histogram (or another
fancier classification technique -- I tried Gaussian mixture models from
scikit-learn but the peaks are definitely not Gaussian so it didn't work
at all). Then you can take a threshold value that correspond to say 80%
of pixels that have a value smaller than the local minimum (using
scipy.stats.scoreatpecentile). And do the same for the other peak. This
insures that you have only a small number of unlabeled pixels having grey
values around the local minimum, and the random walker should do well in
this case (at least, it does on this image, as I checked!). Another rule of
thumb with the random walker is that markers should be distributed in
space as homogeneously as possible (I guess I should add that in the
docstring!).

What I also found with your image was that I had to change the beta
parameter of the random walker so that it works on this image, because
you have a lot of texture, therefore diffusion is very difficult and you
can end up with an ill-conditioned system if beta is too large. beta=10
was working fine for me. This texture is also the reason why I had to
take a large fraction of the two peaks as markers (80%, as I suggested
above) to get a good result, since diffusion is too difficult when there
are not enough markers.   

Of course, all this doesn't tell you whether you have blobs at all or
not, and this is -- as I understand it -- one of the primary information
that you want to obtain. When you have only one small blob it is unlikely
that the histogram has two peaks and you won't be able to use the above
method. Are your images always taken with the same exposure settings?
What I mean is, can you know in advance what are the grey values of the
blob?

What I understand from your two images is that you know that there are
blobs not only from the grey values but also because these regions have a
certain size and have less texture (a smaller local variance) than
elsewhere, so you might want to incorporate this criteria as well in
the choice of markers. If you can have hard thresholds for these criteria
(that are the same for all images), then it is much easier, but if you
can't, it's really a question of determining whether you have one or two
clusters in your features space (features being grey values, local
variance etc.) and I don't know a proper method to do that.

You may want to ask this last question on the scikit-learn mailing-list as
well, because it is really a classification problem.

Hope this helps a little, and thanks for the interesting usecase! Keep us
informed about your progress.

Cheers,
Emmanuelle

On Wed, Feb 08, 2012 at 01:52:35AM -0800, Michael Aye wrote:
>    Hi StÃ¯Â¿Â½fan,
>    thanks for playing with it, looks good.
>    Do you have any comment on the question I had with the random_walker
>    concerning the marker thresholds?
>    Best regards,
>    Michael