Thank you! The clustering step is definitely what was missing from my understanding of the "bag of words" approach. -Lisa On Thursday, July 18, 2013 4:14:46 AM UTC-4, Anders Boesen Lindbo Larsen wrote:
I attach this mail correspondence as it may be relevant for others.
Stefan: Do you think a bag of words implementation would fit into scikit-image? I have some code that I would be happy to polish and contribute. The main problem is that bag-of-words rely on a k-means clustering method which I would prefer to import from scikit-learn because the one from scipy is slow for a large number of samples. It is my impression that scikit-image tries to stay independent of scikit-learn.
Cheers, Anders
---------- Forwarded message ---------- From: Anders Boesen Lindbo Larsen <ab...@dtu.dk <javascript:>> Date: Thu, Jul 18, 2013 at 10:00 AM Subject: Re: about DAISY To: Lisa Torrey <lto...@stlawu.edu <javascript:>>
Hi Lisa,
Cool problem; I have also read about it on the scikits-image mailing list.
I would start out with a simple approach called 'bag of words' (aka. 'bag of features'). First, you sample a bunch of overlapping DAISY features for a representative set of training images and perform a clustering (e.g. k-means with k=1000) of these descriptors. You can think of the cluster centers (aka. visual words) as a vocabulary. An image can now be described by extracting DAISY features and mapping each feature to its nearest cluster center in the vocabulary. By counting the number of occurrences of each visual word you end up with a histogram which you can use for comparing images. Bag of words models have shown quite successful for many flavors of visual recognition because they are able to capture texture and image structure in a generic manner. That is, you don't have to engineer the model much to make it fit your problem.
I'd be happy to help you if you have further questions.
Best, Anders
Hi Anders -
I'm trying to determine if DAISY descriptors might be suitable for a
I'm working on. I see that you have some expertise in this area, since you contributed the DAISY code to scikit-image, and I'm wondering if you'd be willing to let me know your thoughts.
I'm mainly trying to understand if DAISY descriptors could be effectively used as features in a binary classification problem where the two image classes have a lot of internal variation.
The two classes I'm working with are two types of moss. Type 1 is typically a stalk with leaves on it. Type 2 is typically a stalk with some branches coming off it, and leaves on the branches. But there's quite a bit of visual diversity within these types. A type represents a group of moss species
On Tue, Jul 16, 2013 at 6:02 PM, Lisa Torrey <lto...@stlawu.edu<javascript:>> wrote: problem that
can look surprisingly different from each other. On top of that, the images I've got have no common size or orientation.
If you have any thoughts, I'd love to hear them. I can share some examples of moss images if you're curious, but even a gut reaction would be helpful.
-Lisa