seeking advice on HoG applicability
Hi folks - I'm looking at an image classification problem, and wondering whether HoG should be applicable to it. If any of you are willing to take a look and give some advice, that would be wonderful. I have a machine learning background, but working with image data is a new area for me. The problem is to distinguish between two types of moss. Type 1 tends to consist of upright stalks with few or no branches. Type 2 tends to have secondary branches coming off the primary stalk. There's quite a bit of visual diversity within these types. I've linked some images below. Type 1: http://myslu.stlawu.edu/~ltorrey/moss/andrea_rothii.jpg http://myslu.stlawu.edu/~ltorrey/moss/mnium_spinulosum.jpg Type 2: http://myslu.stlawu.edu/~ltorrey/moss/climacium_americanum.jpg http://myslu.stlawu.edu/~ltorrey/moss/rhytidiadelphus_triquetrus.jpg When I came across the Dalal paper, I thought my problem might have something in common with the pedestrian detection problem, so I tried extracting HoG features and feeding them into an SVM classifier. This failed miserably - the SVM does no better than random guessing. I'm now trying to weigh potential reasons. The first possible reason on my list is the diversity among mosses of the same type. There isn't necessarily a "type 1 shape" and a "type 2 shape," at least not to the degree that there's a "pedestrian shape." Perhaps this means HoG isn't really the right approach to my problem after all? Other reasons may include: - I have much less data. (Just 77 positives and 78 negatives, compared to Dalal's 1239 and 12180.) - My images aren't all the same size, like the pedestrian images are. (I'm not sure if this would matter?) - My images are much higher resolution. (I've been downscaling them by a factor of 8, but the feature vectors are still enormous.) - I'm just using default parameters so far. (In the absence of any signal, tweaking seems unproductive.) Any thoughts or suggestions would be welcome! -Lisa
Hi Lisa, That's an interesting problem! Off the top of my head, I have a couple questions: Are you making any attempt to mask the HoG features to the above-ground, green regions? The example images have the entire root structure shown, yet the visual classification you described is exclusive to the greenery. The roots are not well separated from the soil, either, so that entire region is may be confounding your task. This would be true for any feature algorithm, not just HoG. Have you tried visualizing the HoG output with the `visualize` kwarg? This could give you a sense for what HoG is actually extracting. It sounds like you may have a dimensionality problem thanks to high image resolution, combined with a relatively low number of images to compare. This can be partially addressed by tweaking HoG parameters (especially `pixels_per_cell`, I believe) or scaling your images down to a uniform, smaller size. In addition, scikit-learn has several feature selection algorithms, such as PCA, which can help reduce the number a features to a manageable level. If I get the chance to directly play with your example pictures, I'll pop back in with a few more thoughts. Good luck, Josh On Wednesday, July 3, 2013 2:10:19 PM UTC-5, Lisa Torrey wrote:
Hi folks -
I'm looking at an image classification problem, and wondering whether HoG should be applicable to it. If any of you are willing to take a look and give some advice, that would be wonderful. I have a machine learning background, but working with image data is a new area for me.
The problem is to distinguish between two types of moss. Type 1 tends to consist of upright stalks with few or no branches. Type 2 tends to have secondary branches coming off the primary stalk. There's quite a bit of visual diversity within these types. I've linked some images below.
Type 1: http://myslu.stlawu.edu/~ltorrey/moss/andrea_rothii.jpg http://myslu.stlawu.edu/~ltorrey/moss/mnium_spinulosum.jpg
Type 2: http://myslu.stlawu.edu/~ltorrey/moss/climacium_americanum.jpg http://myslu.stlawu.edu/~ltorrey/moss/rhytidiadelphus_triquetrus.jpg
When I came across the Dalal paper, I thought my problem might have something in common with the pedestrian detection problem, so I tried extracting HoG features and feeding them into an SVM classifier. This failed miserably - the SVM does no better than random guessing. I'm now trying to weigh potential reasons.
The first possible reason on my list is the diversity among mosses of the same type. There isn't necessarily a "type 1 shape" and a "type 2 shape," at least not to the degree that there's a "pedestrian shape." Perhaps this means HoG isn't really the right approach to my problem after all?
Other reasons may include: - I have much less data. (Just 77 positives and 78 negatives, compared to Dalal's 1239 and 12180.) - My images aren't all the same size, like the pedestrian images are. (I'm not sure if this would matter?) - My images are much higher resolution. (I've been downscaling them by a factor of 8, but the feature vectors are still enormous.) - I'm just using default parameters so far. (In the absence of any signal, tweaking seems unproductive.)
Any thoughts or suggestions would be welcome!
-Lisa
Thanks! Good point about the soil and roots. I've been using the entire images so far, but maybe I should pre-filter them, and keep only the "greenish" areas, before doing any feature extraction. I have visualized the output, and it looks like the gradients mostly go across leaf boundaries. That may be too fine-grained, since it's probably the moss/background boundary that is most important for this problem. Maybe I should actually segment (or skeletonize?) the image to get rid of the internal detail. -Lisa On Sunday, July 7, 2013 8:00:06 PM UTC-4, Josh Warner wrote:
Hi Lisa,
That's an interesting problem! Off the top of my head, I have a couple questions:
Are you making any attempt to mask the HoG features to the above-ground, green regions? The example images have the entire root structure shown, yet the visual classification you described is exclusive to the greenery. The roots are not well separated from the soil, either, so that entire region is may be confounding your task. This would be true for any feature algorithm, not just HoG.
Have you tried visualizing the HoG output with the `visualize` kwarg? This could give you a sense for what HoG is actually extracting.
It sounds like you may have a dimensionality problem thanks to high image resolution, combined with a relatively low number of images to compare. This can be partially addressed by tweaking HoG parameters (especially `pixels_per_cell`, I believe) or scaling your images down to a uniform, smaller size. In addition, scikit-learn has several feature selection algorithms, such as PCA, which can help reduce the number a features to a manageable level.
If I get the chance to directly play with your example pictures, I'll pop back in with a few more thoughts.
Good luck,
Josh
On Wednesday, July 3, 2013 2:10:19 PM UTC-5, Lisa Torrey wrote:
Hi folks -
I'm looking at an image classification problem, and wondering whether HoG should be applicable to it. If any of you are willing to take a look and give some advice, that would be wonderful. I have a machine learning background, but working with image data is a new area for me.
The problem is to distinguish between two types of moss. Type 1 tends to consist of upright stalks with few or no branches. Type 2 tends to have secondary branches coming off the primary stalk. There's quite a bit of visual diversity within these types. I've linked some images below.
Type 1: http://myslu.stlawu.edu/~ltorrey/moss/andrea_rothii.jpg http://myslu.stlawu.edu/~ltorrey/moss/mnium_spinulosum.jpg
Type 2: http://myslu.stlawu.edu/~ltorrey/moss/climacium_americanum.jpg http://myslu.stlawu.edu/~ltorrey/moss/rhytidiadelphus_triquetrus.jpg
When I came across the Dalal paper, I thought my problem might have something in common with the pedestrian detection problem, so I tried extracting HoG features and feeding them into an SVM classifier. This failed miserably - the SVM does no better than random guessing. I'm now trying to weigh potential reasons.
The first possible reason on my list is the diversity among mosses of the same type. There isn't necessarily a "type 1 shape" and a "type 2 shape," at least not to the degree that there's a "pedestrian shape." Perhaps this means HoG isn't really the right approach to my problem after all?
Other reasons may include: - I have much less data. (Just 77 positives and 78 negatives, compared to Dalal's 1239 and 12180.) - My images aren't all the same size, like the pedestrian images are. (I'm not sure if this would matter?) - My images are much higher resolution. (I've been downscaling them by a factor of 8, but the feature vectors are still enormous.) - I'm just using default parameters so far. (In the absence of any signal, tweaking seems unproductive.)
Any thoughts or suggestions would be welcome!
-Lisa
Hi Lisa Interestingly, Adam Wisniewski was working on this one-class classification problem at the recent SciPy2013 sprint. Olivier Grisel and Nelle Varoquaux from the sklearn team were able to give us some helpful advice, and it might be worth getting in touch with them as well. On Wed, Jul 3, 2013 at 9:10 PM, Lisa Torrey <lisa.torrey@gmail.com> wrote:
- I have much less data. (Just 77 positives and 78 negatives, compared to Dalal's 1239 and 12180.)
You'll probably have to do some kind of cross-validation.
- My images aren't all the same size, like the pedestrian images are. (I'm not sure if this would matter?)
Perhaps investigate multi-scale texture features, such as the wavelet coefficients (see http://www.pybytes.com/pywavelets/ ; even simple statistics might suffice).
- My images are much higher resolution. (I've been downscaling them by a factor of 8, but the feature vectors are still enormous.)
You'd want to extract some features that help the classifier, e.g. daisy (http://scikit-image.org/docs/dev/auto_examples/plot_daisy.html), texture features via grey-level co-occurrence matrices, or haralick features (we don't yet have those in skimage, although they are available in Luis Coelho's Mahotas). Regards Stéfan
Hi Anders -
I'm trying to determine if DAISY descriptors might be suitable for a
I'm working on. I see that you have some expertise in this area, since you contributed the DAISY code to scikit-image, and I'm wondering if you'd be willing to let me know your thoughts.
I'm mainly trying to understand if DAISY descriptors could be effectively used as features in a binary classification problem where the two image classes have a lot of internal variation.
The two classes I'm working with are two types of moss. Type 1 is typically a stalk with leaves on it. Type 2 is typically a stalk with some branches coming off it, and leaves on the branches. But there's quite a bit of visual diversity within these types. A type represents a group of moss species
I attach this mail correspondence as it may be relevant for others. Stefan: Do you think a bag of words implementation would fit into scikit-image? I have some code that I would be happy to polish and contribute. The main problem is that bag-of-words rely on a k-means clustering method which I would prefer to import from scikit-learn because the one from scipy is slow for a large number of samples. It is my impression that scikit-image tries to stay independent of scikit-learn. Cheers, Anders ---------- Forwarded message ---------- From: Anders Boesen Lindbo Larsen <abll@dtu.dk> Date: Thu, Jul 18, 2013 at 10:00 AM Subject: Re: about DAISY To: Lisa Torrey <ltorrey@stlawu.edu> Hi Lisa, Cool problem; I have also read about it on the scikits-image mailing list. I would start out with a simple approach called 'bag of words' (aka. 'bag of features'). First, you sample a bunch of overlapping DAISY features for a representative set of training images and perform a clustering (e.g. k-means with k=1000) of these descriptors. You can think of the cluster centers (aka. visual words) as a vocabulary. An image can now be described by extracting DAISY features and mapping each feature to its nearest cluster center in the vocabulary. By counting the number of occurrences of each visual word you end up with a histogram which you can use for comparing images. Bag of words models have shown quite successful for many flavors of visual recognition because they are able to capture texture and image structure in a generic manner. That is, you don't have to engineer the model much to make it fit your problem. I'd be happy to help you if you have further questions. Best, Anders On Tue, Jul 16, 2013 at 6:02 PM, Lisa Torrey <ltorrey@stlawu.edu> wrote: problem that
can look surprisingly different from each other. On top of that, the images I've got have no common size or orientation.
If you have any thoughts, I'd love to hear them. I can share some examples of moss images if you're curious, but even a gut reaction would be helpful.
-Lisa
Thank you! The clustering step is definitely what was missing from my understanding of the "bag of words" approach. -Lisa On Thursday, July 18, 2013 4:14:46 AM UTC-4, Anders Boesen Lindbo Larsen wrote:
I attach this mail correspondence as it may be relevant for others.
Stefan: Do you think a bag of words implementation would fit into scikit-image? I have some code that I would be happy to polish and contribute. The main problem is that bag-of-words rely on a k-means clustering method which I would prefer to import from scikit-learn because the one from scipy is slow for a large number of samples. It is my impression that scikit-image tries to stay independent of scikit-learn.
Cheers, Anders
---------- Forwarded message ---------- From: Anders Boesen Lindbo Larsen <ab...@dtu.dk <javascript:>> Date: Thu, Jul 18, 2013 at 10:00 AM Subject: Re: about DAISY To: Lisa Torrey <lto...@stlawu.edu <javascript:>>
Hi Lisa,
Cool problem; I have also read about it on the scikits-image mailing list.
I would start out with a simple approach called 'bag of words' (aka. 'bag of features'). First, you sample a bunch of overlapping DAISY features for a representative set of training images and perform a clustering (e.g. k-means with k=1000) of these descriptors. You can think of the cluster centers (aka. visual words) as a vocabulary. An image can now be described by extracting DAISY features and mapping each feature to its nearest cluster center in the vocabulary. By counting the number of occurrences of each visual word you end up with a histogram which you can use for comparing images. Bag of words models have shown quite successful for many flavors of visual recognition because they are able to capture texture and image structure in a generic manner. That is, you don't have to engineer the model much to make it fit your problem.
I'd be happy to help you if you have further questions.
Best, Anders
Hi Anders -
I'm trying to determine if DAISY descriptors might be suitable for a
I'm working on. I see that you have some expertise in this area, since you contributed the DAISY code to scikit-image, and I'm wondering if you'd be willing to let me know your thoughts.
I'm mainly trying to understand if DAISY descriptors could be effectively used as features in a binary classification problem where the two image classes have a lot of internal variation.
The two classes I'm working with are two types of moss. Type 1 is typically a stalk with leaves on it. Type 2 is typically a stalk with some branches coming off it, and leaves on the branches. But there's quite a bit of visual diversity within these types. A type represents a group of moss species
On Tue, Jul 16, 2013 at 6:02 PM, Lisa Torrey <lto...@stlawu.edu<javascript:>> wrote: problem that
can look surprisingly different from each other. On top of that, the images I've got have no common size or orientation.
If you have any thoughts, I'd love to hear them. I can share some examples of moss images if you're curious, but even a gut reaction would be helpful.
-Lisa
Hi Anders On Thu, Jul 18, 2013 at 10:14 AM, Anders Boesen Lindbo Larsen <anders.bll@gmail.com> wrote:
I attach this mail correspondence as it may be relevant for others.
Stefan: Do you think a bag of words implementation would fit into scikit-image? I have some code that I would be happy to polish and contribute. The main problem is that bag-of-words rely on a k-means clustering method which I would prefer to import from scikit-learn because the one from scipy is slow for a large number of samples. It is my impression that scikit-image tries to stay independent of scikit-learn.
I presume sklearn already has a bag-of-words implementation, so what we really need are more examples of how to use sklearn and skimage together. At the moment, I keep those examples in https://github.com/stefanv/scikit-image-demos but I can foresee including them directly in the gallery in the future. Stéfan
participants (4)
-
Anders Boesen Lindbo Larsen
-
Josh Warner
-
Lisa Torrey
-
Stéfan van der Walt