I attach this mail correspondence as it may be relevant for others.

Stefan: Do you think a bag of words implementation would fit into scikit-image? I have some code that I would be happy to polish and contribute. The main problem is that bag-of-words rely on a k-means clustering method which I would prefer to import from scikit-learn because the one from scipy is slow for a large number of samples. It is my impression that scikit-image tries to stay independent of scikit-learn.

Cheers,
Anders


---------- Forwarded message ----------
From: Anders Boesen Lindbo Larsen <abll@dtu.dk>
Date: Thu, Jul 18, 2013 at 10:00 AM
Subject: Re: about DAISY
To: Lisa Torrey <ltorrey@stlawu.edu>


Hi Lisa,

Cool problem; I have also read about it on the scikits-image mailing list.

I would start out with a simple approach called 'bag of words' (aka.
'bag of features'). First, you sample a bunch of overlapping DAISY
features for a representative set of training images and perform a
clustering (e.g. k-means with k=1000) of these descriptors. You can
think of the cluster centers (aka. visual words) as a vocabulary. An
image can now be described by extracting DAISY features and mapping
each feature to its nearest cluster center in the vocabulary. By
counting the number of occurrences of each visual word you end up with
a histogram which you can use for comparing images.
Bag of words models have shown quite successful for many flavors of
visual recognition because they are able to capture texture and image
structure in a generic manner. That is, you don't have to engineer the
model much to make it fit your problem.

I'd be happy to help you if you have further questions.

Best,
Anders

On Tue, Jul 16, 2013 at 6:02 PM, Lisa Torrey <ltorrey@stlawu.edu> wrote:
> Hi Anders -
>
> I'm trying to determine if DAISY descriptors might be suitable for a problem
> I'm working on. I see that you have some expertise in this area, since you
> contributed the DAISY code to scikit-image, and I'm wondering if you'd be
> willing to let me know your thoughts.
>
> I'm mainly trying to understand if DAISY descriptors could be effectively
> used as features in a binary classification problem where the two image
> classes have a lot of internal variation.
>
> The two classes I'm working with are two types of moss. Type 1 is typically
> a stalk with leaves on it. Type 2 is typically a stalk with some branches
> coming off it, and leaves on the branches. But there's quite a bit of visual
> diversity within these types. A type represents a group of moss species that
> can look surprisingly different from each other. On top of that, the images
> I've got have no common size or orientation.
>
> If you have any thoughts, I'd love to hear them. I can share some examples
> of moss images if you're curious, but even a gut reaction would be helpful.
>
> -Lisa
>