[scikit-learn] Scikit Learn in a Cray computer

Sun Jun 30 18:20:05 EDT 2019

Dear All,

Alex Lovell-Troy heads up innovation/cloud supercomputing at Cray (cc'd)
and he is a great resource for all things. I thought he might find this
thread useful.

Best, Alex

On Fri, Jun 28, 2019 at 11:45 PM Olivier Grisel <olivier.grisel at ensta.org>
wrote:

> You have to use a dedicated framework to distribute the computation on a
> cluster like you cray system.
>
> You can use mpi, or dask with dask-jobqueue but the also need to run
> parallel algorithms that are efficient when running in a distributed with a
> high cost for communication between distributed worker nodes.
>
> I am not sure that the dbscan implementation in scikit-learn would benefit
> much from naively running in distributed mode.
>
> Le ven. 28 juin 2019 22 h 06, Mauricio Reis <reismc at ime.eb.br> a écrit :
>
>> Sorry, but just now I reread your answer more closely.
>>
>> It seems that the "n_jobs" parameter of the DBScan routine brings no
>> benefit to performance. If I want to improve the performance of the
>> DBScan routine I will have to redesign the solution to use MPI
>> resources.
>>
>> Is it correct?
>>
>> ---
>> Ats.,
>> Mauricio Reis
>>
>> Em 28/06/2019 16:47, Mauricio Reis escreveu:
>> > My laptop has Intel I7 processor with 4 cores. When I run the program
>> > on Windows 10, the "joblib.cpu_count()" routine returns "4". In these
>> > cases, the same test I did on the Cray computer caused a 10% increase
>> > in the processing time of the DBScan routine when I used the "n_jobs =
>> > 4" parameter compared to the processing time of that routine without
>> > this parameter. Do you know what is the cause of the longer processing
>> > time when I use "n_jobs = 4" on my laptop?
>> >
>> > ---
>> > Ats.,
>> > Mauricio Reis
>> >
>> > Em 28/06/2019 06:29, Brown J.B. via scikit-learn escreveu:
>> >>> where you can see "ncpus = 1" (I still do not know why 4 lines were
>> >>> printed -
>> >>>
>> >>> (total of 40 nodes) and each node has 1 CPU and 1 GPU!
>> >>
>> >>> #PBS -l select=1:ncpus=8:mpiprocs=8
>> >>> aprun -n 4 p.sh ./ncpus.py
>> >>
>> >> You can request 8 CPUs from a job scheduler, but if each node the
>> >> script runs on contains only one virtual/physical core, then
>> >> cpu_count() will return 1.
>> >> If that CPU supports multi-threading, you would typically get 2.
>> >>
>> >> For example, on my workstation:
>> >> `--> egrep "processor|model name|core id" /proc/cpuinfo
>> >> processor : 0
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 0
>> >> processor : 1
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 1
>> >> processor : 2
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 0
>> >> processor : 3
>> >> model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
>> >> core id : 1
>> >> `--> python3 -c "from sklearn.externals import joblib;
>> >> print(joblib.cpu_count())"
>> >> 4
>> >>
>> >> It seems that in this situation, if you're wanting to parallelize
>> >> *independent* sklearn calculations (e.g., changing dataset or random
>> >> seed), you'll ask for the MPI by PBS processes like you have, but
>> >> you'll need to place the sklearn computations in a function and then
>> >> take care of distributing that function call across the MPI processes.
>> >>
>> >> Then again, if the runs are independent, it's a lot easier to write a
>> >> for loop in a shell script that changes the dataset/seed and submits
>> >> it to the job scheduler to let the job handler take care of the
>> >> parallel distribution.
>> >> (I do this when performing 10+ independent runs of sklearn modeling,
>> >> where models use multiple threads during calculations; in my case,
>> >> SLURM then takes care of finding the available nodes to distribute the
>> >> work to.)
>> >>
>> >> Hope this helps.
>> >> J.B.
>> >> _______________________________________________
>> >> scikit-learn mailing list
>> >> scikit-learn at python.org
>> >> https://mail.python.org/mailman/listinfo/scikit-learn
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

-- 

Alex Morrise, PhD
Co-Founder & CTO, StayOpen.com
Chief Science Officer, MediaJel.com <http://mediajel.com/>
Professional Bio:  Machine Learning Intelligence
<http://www.linkedin.com/in/amorrise>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190630/a170d837/attachment.html>