
Hello! Firstly, please sign up to the mailing list before posting — if you don't, every post from you has to be manually filtered through. On to your problem! So, it looks like there should be plenty of signal to distinguish between object/no-object. It's key to understand the features you're using. HOG may not be appropriate here: it measures gradients, not image intensity/color. In this case, it looks like there will be many more dark pixels in the object images. What I would do based on the examples you showed is to just take Lab-transformed image and then do a histogram, and use the histogram as the feature vector. You have a lot of labelled images, so use them! I would split your set into 40k training / 10k test, then do 4-fold cross-validation on the training set. scikit-learn has nice classes for doing cross-validation automatically. As to the choice of classifier, it might be worth asking their list, but *by far* the easiest to use "out-of-the-box", without fiddling with parameters, is the Random Forest. Hope that helped! Juan. On Wed, Apr 22, 2015 at 8:21 PM, Snowflake <luecks@gmail.com> wrote:
Hi! I am new to machine learning and I need some help. I want to detect objects inside cells of microscopy images. I have a lot of annotated images (app. 50.000 images with an object and 500.000 without an object). So far I tried to extract features using HOG and classifying using logistic regression and LinearSVC. I have tried several parameters for HOG or color spaces (RGB, HSV, LAB) but I don't see a big difference, the predication rate is about 70 %. I have several questions. How many images should I use to train the descriptor? How many images should I use to test the prediction? I have tried with about 1000 images for training, which gives me 55 % positive and 5000, which gives me about 72 % positive. However, it also depends a lot on the test set, sometimes a test set can reach 80-90 % positive detected images. Here are two examples containing an object and two images without an object: Object01 <http://labtools.ipk-gatersleben.de/ML/with_object01.jpg> object02 <http://labtools.ipk-gatersleben.de/ML/with_object03.jpg> cell01 <http://labtools.ipk-gatersleben.de/ML/cell01.jpg> cell02 <http://labtools.ipk-gatersleben.de/ML/cell02.jpg> Another problem is, sometimes the images contain several objects: objects <http://labtools.ipk-gatersleben.de/ML/with_object02.jpg> Should I try to increase the examples of the learning set? How should I choose the images for the training set, just random? What else could I try? Any help or tips would be very appreciated, thank you very much in advance! -- You received this message because you are subscribed to the Google Groups "scikit-image" group. To unsubscribe from this group and stop receiving emails from it, send an email to scikit-image+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.