Thanks for this. Simple functions are very good for one’s own images, as well as to understand an algorithm, but for a widely used library like scikit-image, flexibility and robustness are least as important as speed. In skimage, we aim to support floating point input images, for which your code won’t work. There is a lot of thought into edge cases going into the skimage implementation, which unfortunately could have an effect on performance. The question we need to solve isn’t “can we make a super-fast watershed implementation”, but, “can we make a flexible and robust implementation that is also fast?” Using Numba is certainly not off-limits, but any candidate implementation should at a minimum pass the skimage test suite.
I once tried a level-by-level implementation of watershed, by the way. It has a fatal flaw, which is that valleys with no markers will never get labeled. Here’s a test case that works with skimage but fails in your implementation. (Note also that your implementation overwrites the seeds image, which is pretty crazy. =)
Juan.
In [44]: import watershed
In [45]: from skimage import morphology
In [46]: import numpy as np
In [47]: image = np.array([1, 0, 1, 0, 1, 0, 1], dtype=np.uint8)
In [48]: seeds = np.array([0, 1, 0, 0, 0, 2, 0])
In [49]: morphology.watershed(image, seeds)
Out[49]: array([1, 1, 1, 1, 2, 2, 2], dtype=int32)
In [50]: watershed.watershed(image, seeds)
Out[50]: array([0, 0, 0, 0, 0, 0, 0])
In [51]: seeds
Out[51]: array([0, 0, 0, 0, 0, 0, 0])