[scikit-learn] Scikit Learn in a Cray computer
Mauricio Reis
reismc at ime.eb.br
Wed Jun 19 16:36:39 EDT 2019
I'd like to understand how parallelism works in the DBScan routine in
SciKit Learn running on the Cray computer and what should I do to
improve the results I'm looking at.
I have adapted the existing example in
[https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py]
to run with 100,000 points and thus enable one processing time allowing
reasonable evaluation of times obtained. I changed the parameter "n_jobs
= x", "x" ranging from 1 to 6. I repeated several times the same
experiments and calculated the average values of the processing time.
n_jobs time
1 21,3
2 15,1
3 14,8
4 15,2
5 15,5
6 15,0
I then get the times that appear in the table above and in the attached
image. As can be seen, there was only effective gain when "n_jobs = 2"
and no difference for larger quantities. And yet, the gain was only less
than 30%!!
Why were the gains so small? Why was there no greater gain for a greater
value of the "n_jobs" parameter? Is it possible to improve the results I
have obtained?
--
Ats.,
Mauricio Reis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Time_X_CPUs (Cray).jpg
Type: image/jpeg
Size: 23348 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190619/0f518db9/attachment-0001.jpg>
More information about the scikit-learn
mailing list