[scikit-learn] KNearestNeighbour is not running in multithread

LUONGO, ALESSANDRO alessandro.luongo at atos.net
Tue May 30 08:22:30 EDT 2017

Hi everyone!

I'm successfully using scikit-learn on a 384 core machine. I'm playing with two deployment:
The first is a anaconda installation of python, which use MKL as backend of numpy, with python 3.6
The second is a "native" installation of scikit-learn and numpy, and thus the backend is based on openblas and python 3.4.5

Both implementations works, and I can see a high number of threads wigh high CPU load. (for instance when I'm doing PCA)

The problem that I don't know how to debug, is that with kNearestNeighbour is using only one core.
This puzzle me, since I can see that since version 0.17, the PR with the parallel KNN has been accepted into the main branch.
 https://github.com/scikit-learn/scikit-learn/pull/4009 ,

Sklearn should have merged this changes 1 year ago, and my version of sklearn is:

> print('The scikit-learn version is {}.'.format(sklearn.__version__))
> The scikit-learn version is 0.18.1.

Do you have any hints on how to use parallel KNN?
I'm classifying a high dimensional dataset of MNIST (image digits). So I'm doing PCA to get vector of dimension 35-50, and then I'm doing a nonlinear expansion, so I'm getting vector of dimension 600-100. That's why I need parallelism so badly.

    clf = KNeighborsClassifier(algorithm='ball_tree')
    clf = clf.fit(train, train_labels)

Thanks for all your amazing work.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170530/57eecffd/attachment.html>

More information about the scikit-learn mailing list