[SciPy-User] Speeding things up - how to use more than one computer core

Sun Apr 7 08:49:45 EDT 2013

This benchmark is poor because you are not taking into account many things
that will happen in your real case. A quick glance at your code tells me
(correct me if I am wrong) that you are doing some partial fitting (I think
this is your parallelization target), and then a global fit of some sort. I
don't know about these particular functions you are using, but you must be
aware that several NumPy functions have a lot of optimizations under the
hood, like automatic parallelization, and so on. Also, a very important
issue here, specially having so many cores, is feeding data to the CPU:
probably, a fair share of your computing time is spent with the CPU waiting
for data to come in.

The performance of a Python program is quite unpredictable, as there are so
many things going on. I think the best thing you can do is to profile your
code, see where are the bottlenecks, and try with the different parallel
methods *on that block* which one works best. Consider also how difficult
is to program and debug it, I have had hard times struggling with
multiprocessing on a very simple program until I got it working.

Regarding the difference between processes and threads: they are both
executing in parallel, but a thread will be bounded to the Python GIL: only
one line of Python will be executed at the time, but this does not apply to
C code in NumPy, or system calls (waiting for data to be written to file).
On the other hand, sharing data between threads is much cheaper than
between processes. On the other hand, multiprocessing will trully execute
them in parallel, using one core for each process, but creating a bigger
overhead. I would say you want multiprocessing, but depending on how is
time spent in your code, and how is NumPy releasing the GIL, you may
actually get a better result with multithreading. Again, if you want to be
sure, test it; but if your first try is good enough for you, you may as
well leave it as it is.

BTW, if you want to read more about memory and parallelization, take a look
at Francesc Alted's fantastic talk on the Advanced Scientific Python
Course: https://python.g-node.org/python-summerschool-2012/starving_cpu ,
and apply if you can.

David.

On 7 April 2013 14:11, Troels Emtekær Linnet <tlinnet at gmail.com> wrote:

> Thanks for pointing that out.
> I did not understand the tuble way to call the function.
>
> But now I get these results:
> Why is joblib so slow?
> And should I go for threading or processes?
>
> -------------------------------
> Method was normal
> Done :0:00:00.040000
> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
> 9999.0] <type 'numpy.float64'>
>
> Method was multi Pool
> Done :0:00:00.422000
> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
> 9999.0] <type 'numpy.float64'>
>
> Method was joblib delayed
> Done :0:00:02.569000
> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
> 9999.0] <type 'numpy.float64'>
>
> Method was handythread
> Done :0:00:00.582000
> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
> 9999.0] <type 'numpy.float64'>
>
> ------------------------------------------------------------------
>
> import numpy as np
> import multiprocessing
> from multiprocessing import Pool
>
> from datetime import datetime
> from joblib import Parallel, delayed
> #
> http://www.scipy.org/Cookbook/Multithreading?action=AttachFile&do=view&target=test_handythread.py
> from handythread import foreach
>
> def getsqrt(n):
>     res = np.sqrt(n**2)
>     return(res)
>
> def main():
>     jobs = multiprocessing.cpu_count()-1
>     a = range(10000)
>     for method in ['normal','multi Pool','joblib delayed','handythread']:
>
>         startTime = datetime.now()
>         sprint=True
>         if method=='normal':
>             res = []
>             for i in a:
>                 b = getsqrt(i)
>                 res.append(b)
>         elif method=='multi Pool':
>
>             pool = Pool(processes=jobs)
>             res = pool.map(getsqrt, a)
>         elif method=='joblib delayed':
>             res = Parallel(n_jobs=jobs)(delayed(getsqrt)(i) for i in a)
>         elif method=='handythread':
>             res = foreach(getsqrt,a,threads=jobs,return_=True)
>
>         else:
>             sprint=False
>         if sprint:
>             print "Method was %s"%method
>             print "Done :%s"%(datetime.now()-startTime)
>             print res[-10:], type(res[-1])
>     return(res)
>
> if __name__ == "__main__":
>     res = main()
>
> Troels
>
> x at normalesup.org>
> On Sun, Apr 07, 2013 at 12:17:59AM +0200, Troels Emtekær Linnet wrote:
> > Method was joblib delayed
> > Done :0:00:00
>
> Hum, this is fishy, isn't it?
>
> >         elif method=='joblib delayed':
> >             Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2'
> for all
> > cores=-1
> >             func,res = delayed(getsqrt), a
>
> I have a hard time reading your code, but it seems to me that you haven't
> computed anything here, just instanciated to Parallel object.
>
> You need to do:
>
>     res = Parallel(n_jobs=-2)(delayed(getsqrt)(i) for i in a)
>
> I would expect joblib to be on the same order of magnitude speed-wise as
> multiprocessing (hell, it's just a wrapper on multiprocessing). It's just
> going to be more robust code than instanciating manually a Pool (deal
> better with error, and optionally dispatching on-demand computation).
>
> Gaël
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20130407/c76a0629/attachment.html>