[SciPy-User] Speeding things up - how to use more than one computer core

Troels Emtekær Linnet tlinnet at gmail.com
Sun Apr 7 08:11:07 EDT 2013


Thanks for pointing that out.
I did not understand the tuble way to call the function.

But now I get these results:
Why is joblib so slow?
And should I go for threading or processes?

-------------------------------
Method was normal
Done :0:00:00.040000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
9999.0] <type 'numpy.float64'>

Method was multi Pool
Done :0:00:00.422000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
9999.0] <type 'numpy.float64'>

Method was joblib delayed
Done :0:00:02.569000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
9999.0] <type 'numpy.float64'>

Method was handythread
Done :0:00:00.582000
[9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
9999.0] <type 'numpy.float64'>

------------------------------------------------------------------

import numpy as np
import multiprocessing
from multiprocessing import Pool
from datetime import datetime
from joblib import Parallel, delayed
#
http://www.scipy.org/Cookbook/Multithreading?action=AttachFile&do=view&target=test_handythread.py
from handythread import foreach

def getsqrt(n):
    res = np.sqrt(n**2)
    return(res)

def main():
    jobs = multiprocessing.cpu_count()-1
    a = range(10000)
    for method in ['normal','multi Pool','joblib delayed','handythread']:
        startTime = datetime.now()
        sprint=True
        if method=='normal':
            res = []
            for i in a:
                b = getsqrt(i)
                res.append(b)
        elif method=='multi Pool':
            pool = Pool(processes=jobs)
            res = pool.map(getsqrt, a)
        elif method=='joblib delayed':
            res = Parallel(n_jobs=jobs)(delayed(getsqrt)(i) for i in a)
        elif method=='handythread':
            res = foreach(getsqrt,a,threads=jobs,return_=True)
        else:
            sprint=False
        if sprint:
            print "Method was %s"%method
            print "Done :%s"%(datetime.now()-startTime)
            print res[-10:], type(res[-1])
    return(res)

if __name__ == "__main__":
    res = main()

Troels

x at normalesup.org>
On Sun, Apr 07, 2013 at 12:17:59AM +0200, Troels Emtekær Linnet wrote:
> Method was joblib delayed
> Done :0:00:00

Hum, this is fishy, isn't it?

>         elif method=='joblib delayed':
>             Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2'
for all
> cores=-1
>             func,res = delayed(getsqrt), a

I have a hard time reading your code, but it seems to me that you haven't
computed anything here, just instanciated to Parallel object.

You need to do:

    res = Parallel(n_jobs=-2)(delayed(getsqrt)(i) for i in a)

I would expect joblib to be on the same order of magnitude speed-wise as
multiprocessing (hell, it's just a wrapper on multiprocessing). It's just
going to be more robust code than instanciating manually a Pool (deal
better with error, and optionally dispatching on-demand computation).

Gaël
_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20130407/f0f2a528/attachment.html>


More information about the SciPy-User mailing list