[scikit-learn] How to train an image classifier on directories

Joel Nothman joel.nothman at gmail.com
Sun Jan 14 18:59:51 EST 2018


Why not just do the train_test_split over directory names, and later (e.g.
in a Pipeline) read in the images?

On 15 January 2018 at 10:18, Abdul Abdul <abdul.sw84 at gmail.com> wrote:

> Hello,
>
> I'm trying to train an image classifier, but a bit confused on how to
> label my data. The issue here is that for each class I have subdirectories,
> each of which contains two images. So, it is not I have classes, and in
> each class I simply have the images that come under that class (i.e. cats
> vs. dogs).
>
> I will show here some attempts for grouping the data together, but not yet
> able to figure how to assign the label, and pass the pairs of images along
> with the label to the image classifier.
>
> So, that's how I simply read the two images:
>
> im1 = cv2.imread('img1.jpg')
> im1 = img_to_array(im1)
>
> im2 = cv2.imread('img2.jpg')
> im2 = img_to_array(im2)
>
> I then *pair* the images as follows:
>
> pair = (im1,im2)
>
> For labeling, this is what I did:
>
> label = root.split(os.path.sep)[-2]
> label = 1 if label == 'cat' else 0
>
> How can I group the above pairs of images (im1,im2) and attach the label
> to them? Especially that I want to pass them to the following scikit-learn
>  function:
>
> (trainX, testX, trainY, testY) = train_test_split(data,
>     labels, test_size=0.25, random_state=42)
>
> Thanks.
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180115/39f3baa2/attachment-0001.html>


More information about the scikit-learn mailing list