[SciPy-User] Speed-up simple function

Nicolau Werneck nwerneck at gmail.com
Mon Jan 10 12:22:29 EST 2011


That is one excellent suggestion, using multiple cores is probably the
best thing to do. I have tried a basic Cython implementation (because
it's such a basic and interesting problem), but I couldn't yet reach a
speedup of even 2.0.

The big villain in the expression is the 'S**1.5'. Removing that from
my Cython code gives a 12x speedup. Avoiding that exponentiation can
be probably beneficial to other techniques too...

I have recently done a code that uses the rsqrt SSE instruction for
approximate square roots. This is very useful... Wouldn't it be nice
to have that on scipy?

++nic



On Mon, Jan 10, 2011 at 05:52:33PM +0100, Francesc Alted wrote:
> A Monday 10 January 2011 16:47:44 g.plantageneto at runbox.com escrigué:
> > > This looks terribly over complicated. What are you trying to do and
> > > where is the polynomial?
> > 
> > I see, sorry, I'll try to put it more clearly. My computation is
> > something like this: for each time step, I compute the value of some
> > polynomial on all array elements, then I average the results to
> > obtain a single number:
> [clip]
> 
> As others have said, Cython is good option.  Perhaps even easier would 
> be Numexpr, that let's you get nice speed-ups on complex expressions, 
> and as a plus, it lets you use any number of cores that your machine 
> has.
> 
> I've made a small demo (attached) on how Numexpr works based on your 
> polynomial.  Here are my results of running it on a 6-core machine:
> 
> Computing with NumPy with 10^5 elements:
> time spent: 0.0169
> Computing with Numexpr with 10^5 elements:
> time spent for 1 threads: 0.0052 speed-up: 3.22
> time spent for 2 threads: 0.0028 speed-up: 6.11
> time spent for 3 threads: 0.0022 speed-up: 7.82
> time spent for 4 threads: 0.0016 speed-up: 10.26
> time spent for 5 threads: 0.0014 speed-up: 12.31
> time spent for 6 threads: 0.0013 speed-up: 13.35
> 
> Cheers,
> 
> -- 
> Francesc Alted

> import math
> import numpy as np
> from numpy.testing import assert_array_almost_equal
> import numexpr as ne
> from time import time
> 
> N = 1e5  # number of elements in arrays
> NTIMES = 100
> 
> def _dens0(S,T, kernel):
>     """Density of seawater at zero pressure"""
> 
>     # --- Define constants ---
>     a0 = 999.842594
>     a1 =   6.793952e-2
>     a2 =  -9.095290e-3
>     a3 =   1.001685e-4
>     a4 =  -1.120083e-6
>     a5 =   6.536332e-9
> 
>     b0 =   8.24493e-1
>     b1 =  -4.0899e-3
>     b2 =   7.6438e-5
>     b3 =  -8.2467e-7
>     b4 =   5.3875e-9
> 
>     c0 =  -5.72466e-3
>     c1 =   1.0227e-4
>     c2 =  -1.6546e-6
> 
>     d0 =   4.8314e-4
> 
>     # --- Computations ---
>     # Density of pure water
>     if kernel == "numpy":
>         SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T
>         # More temperature polynomials
>         RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T
>         RC = c0 + (c1 + c2*T)*T
>         result = SMOW + RB*S + RC*(S**1.5) + d0*S*S
>     elif kernel == "numexpr":
>         SMOW = "a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T"
>         # More temperature polynomials
>         RB = "b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T"
>         RC = "c0 + (c1 + c2*T)*T"
>         poly = "(%s) + (%s)*S + (%s)*(S**1.5) + (d0*S*S)" %  (SMOW, RB, RC)
>         result = ne.evaluate(poly)
>     else:
>         raise ValueError, "Unsupported kernel"
> 
>     return result
> 
> S = np.random.rand(N)
> R = np.random.rand(N)
> 
> print "Computing with NumPy:"
> t0 = time()
> for i in range(NTIMES):
>     r1 = _dens0(S, R, "numpy")
> tnp = (time()-t0) / NTIMES
> print "time spent: %.4f" % tnp
> 
> print "Computing with Numexpr with 10^%d elements:" % int(math.log10(N))
> for nthreads in range(1, ne.ncores+1):
>     ne.set_num_threads(nthreads)
>     t0 = time()
>     for i in range(NTIMES):
>         r2 = _dens0(S, R, "numexpr")
>     tne = (time()-t0) / NTIMES
>     print "time spent for %d threads: %.4f" % (nthreads, tne),
>     print "speed-up: %.2f" % (tnp / tne)
> 
> assert_array_almost_equal(r1, r2, 15, "Arrays are not equal")

> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


-- 
Nicolau Werneck <nwerneck at gmail.com>          C3CF E29F 5350 5DAA 3705
http://www.lti.pcs.usp.br/~nwerneck           7B9E D6C4 37BB DA64 6F15
Linux user #460716
"C++ is history repeated as tragedy. Java is history repeated as farce."
-- Scott McKay




More information about the SciPy-User mailing list