Object detection in images (HOG)

Hi! I am new to machine learning and I need some help. I want to detect objects inside cells of microscopy images. I have a lot of annotated images (app. 50.000 images with an object and 500.000 without an object). So far I tried to extract features using HOG and classifying using logistic regression and LinearSVC. I have tried several parameters for HOG or color spaces (RGB, HSV, LAB) but I don't see a big difference, the predication rate is about 70 %. I have several questions. How many images should I use to train the descriptor? How many images should I use to test the prediction? I have tried with about 1000 images for training, which gives me 55 % positive and 5000, which gives me about 72 % positive. However, it also depends a lot on the test set, sometimes a test set can reach 80-90 % positive detected images. Here are two examples containing an object and two images without an object: Object01 <http://labtools.ipk-gatersleben.de/ML/with_object01.jpg> object02 <http://labtools.ipk-gatersleben.de/ML/with_object03.jpg> cell01 <http://labtools.ipk-gatersleben.de/ML/cell01.jpg> cell02 <http://labtools.ipk-gatersleben.de/ML/cell02.jpg> Another problem is, sometimes the images contain several objects: objects <http://labtools.ipk-gatersleben.de/ML/with_object02.jpg> Should I try to increase the examples of the learning set? How should I choose the images for the training set, just random? What else could I try? Any help or tips would be very appreciated, thank you very much in advance!

Hello! Firstly, please sign up to the mailing list before posting — if you don't, every post from you has to be manually filtered through. On to your problem! So, it looks like there should be plenty of signal to distinguish between object/no-object. It's key to understand the features you're using. HOG may not be appropriate here: it measures gradients, not image intensity/color. In this case, it looks like there will be many more dark pixels in the object images. What I would do based on the examples you showed is to just take Lab-transformed image and then do a histogram, and use the histogram as the feature vector. You have a lot of labelled images, so use them! I would split your set into 40k training / 10k test, then do 4-fold cross-validation on the training set. scikit-learn has nice classes for doing cross-validation automatically. As to the choice of classifier, it might be worth asking their list, but *by far* the easiest to use "out-of-the-box", without fiddling with parameters, is the Random Forest. Hope that helped! Juan. On Wed, Apr 22, 2015 at 8:21 PM, Snowflake <luecks@gmail.com> wrote:
Hi! I am new to machine learning and I need some help. I want to detect objects inside cells of microscopy images. I have a lot of annotated images (app. 50.000 images with an object and 500.000 without an object). So far I tried to extract features using HOG and classifying using logistic regression and LinearSVC. I have tried several parameters for HOG or color spaces (RGB, HSV, LAB) but I don't see a big difference, the predication rate is about 70 %. I have several questions. How many images should I use to train the descriptor? How many images should I use to test the prediction? I have tried with about 1000 images for training, which gives me 55 % positive and 5000, which gives me about 72 % positive. However, it also depends a lot on the test set, sometimes a test set can reach 80-90 % positive detected images. Here are two examples containing an object and two images without an object: Object01 <http://labtools.ipk-gatersleben.de/ML/with_object01.jpg> object02 <http://labtools.ipk-gatersleben.de/ML/with_object03.jpg> cell01 <http://labtools.ipk-gatersleben.de/ML/cell01.jpg> cell02 <http://labtools.ipk-gatersleben.de/ML/cell02.jpg> Another problem is, sometimes the images contain several objects: objects <http://labtools.ipk-gatersleben.de/ML/with_object02.jpg> Should I try to increase the examples of the learning set? How should I choose the images for the training set, just random? What else could I try? Any help or tips would be very appreciated, thank you very much in advance! -- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

Hello Juan! Thank you for your reply! I am sorry about the technical problem, Google told me that I am signed up for this group, I did not realize. I hope this message will be recognized as a member. I really appreciate your tips and experience. However, I have one concern about using only intensity/color. I have several images, were the cell and the object are very light stained and others with objects which I don't want to detect are very dark stained, that's why I used HOG (the object which I am looking for has always kind of finger structure). I am giving it a try at the moment with Lab features and I will see :-) Thanks a lot for the cross validation tip and how many images to use, this was very helpful. Cheers, Stefanie Am Mittwoch, 22. April 2015 12:42:25 UTC+2 schrieb Juan Nunez-Iglesias:
Hello!
Firstly, please sign up to the mailing list before posting — if you don't, every post from you has to be manually filtered through.
On to your problem!
So, it looks like there should be plenty of signal to distinguish between object/no-object. It's key to understand the features you're using. HOG may not be appropriate here: it measures gradients, not image intensity/color. In this case, it looks like there will be many more dark pixels in the object images. What I would do based on the examples you showed is to just take Lab-transformed image and then do a histogram, and use the histogram as the feature vector.
You have a lot of labelled images, so use them! I would split your set into 40k training / 10k test, then do 4-fold cross-validation on the training set. scikit-learn has nice classes for doing cross-validation automatically.
As to the choice of classifier, it might be worth asking their list, but *by far* the easiest to use "out-of-the-box", without fiddling with parameters, is the Random Forest.
Hope that helped!
Juan.
On Wed, Apr 22, 2015 at 8:21 PM, Snowflake <lue...@gmail.com <javascript:>
wrote:
Hi!
I am new to machine learning and I need some help.
I want to detect objects inside cells of microscopy images. I have a lot of annotated images (app. 50.000 images with an object and 500.000 without an object).
So far I tried to extract features using HOG and classifying using logistic regression and LinearSVC. I have tried several parameters for HOG or color spaces (RGB, HSV, LAB) but I don't see a big difference, the predication rate is about 70 %.
I have several questions. How many images should I use to train the descriptor? How many images should I use to test the prediction?
I have tried with about 1000 images for training, which gives me 55 % positive and 5000, which gives me about 72 % positive. However, it also depends a lot on the test set, sometimes a test set can reach 80-90 % positive detected images.
Here are two examples containing an object and two images without an object:
Object01 <http://labtools.ipk-gatersleben.de/ML/with_object01.jpg> object02 <http://labtools.ipk-gatersleben.de/ML/with_object03.jpg>
cell01 <http://labtools.ipk-gatersleben.de/ML/cell01.jpg>
cell02 <http://labtools.ipk-gatersleben.de/ML/cell02.jpg>
Another problem is, sometimes the images contain several objects:
objects <http://labtools.ipk-gatersleben.de/ML/with_object02.jpg>
Should I try to increase the examples of the learning set? How should I choose the images for the training set, just random? What else could I try?
Any help or tips would be very appreciated, thank you very much in advance!
-- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image...@googlegroups.com <javascript:>. For more options, visit https://groups.google.com/d/optout.
participants (2)
-
Juan Nunez-Iglesias
-
Snowflake