[SciPy-User] Speed-up simple function
Nicolau Werneck
nwerneck at gmail.com
Mon Jan 10 12:22:29 EST 2011
That is one excellent suggestion, using multiple cores is probably the
best thing to do. I have tried a basic Cython implementation (because
it's such a basic and interesting problem), but I couldn't yet reach a
speedup of even 2.0.
The big villain in the expression is the 'S**1.5'. Removing that from
my Cython code gives a 12x speedup. Avoiding that exponentiation can
be probably beneficial to other techniques too...
I have recently done a code that uses the rsqrt SSE instruction for
approximate square roots. This is very useful... Wouldn't it be nice
to have that on scipy?
++nic
On Mon, Jan 10, 2011 at 05:52:33PM +0100, Francesc Alted wrote:
> A Monday 10 January 2011 16:47:44 g.plantageneto at runbox.com escrigué:
> > > This looks terribly over complicated. What are you trying to do and
> > > where is the polynomial?
> >
> > I see, sorry, I'll try to put it more clearly. My computation is
> > something like this: for each time step, I compute the value of some
> > polynomial on all array elements, then I average the results to
> > obtain a single number:
> [clip]
>
> As others have said, Cython is good option. Perhaps even easier would
> be Numexpr, that let's you get nice speed-ups on complex expressions,
> and as a plus, it lets you use any number of cores that your machine
> has.
>
> I've made a small demo (attached) on how Numexpr works based on your
> polynomial. Here are my results of running it on a 6-core machine:
>
> Computing with NumPy with 10^5 elements:
> time spent: 0.0169
> Computing with Numexpr with 10^5 elements:
> time spent for 1 threads: 0.0052 speed-up: 3.22
> time spent for 2 threads: 0.0028 speed-up: 6.11
> time spent for 3 threads: 0.0022 speed-up: 7.82
> time spent for 4 threads: 0.0016 speed-up: 10.26
> time spent for 5 threads: 0.0014 speed-up: 12.31
> time spent for 6 threads: 0.0013 speed-up: 13.35
>
> Cheers,
>
> --
> Francesc Alted
> import math
> import numpy as np
> from numpy.testing import assert_array_almost_equal
> import numexpr as ne
> from time import time
>
> N = 1e5 # number of elements in arrays
> NTIMES = 100
>
> def _dens0(S,T, kernel):
> """Density of seawater at zero pressure"""
>
> # --- Define constants ---
> a0 = 999.842594
> a1 = 6.793952e-2
> a2 = -9.095290e-3
> a3 = 1.001685e-4
> a4 = -1.120083e-6
> a5 = 6.536332e-9
>
> b0 = 8.24493e-1
> b1 = -4.0899e-3
> b2 = 7.6438e-5
> b3 = -8.2467e-7
> b4 = 5.3875e-9
>
> c0 = -5.72466e-3
> c1 = 1.0227e-4
> c2 = -1.6546e-6
>
> d0 = 4.8314e-4
>
> # --- Computations ---
> # Density of pure water
> if kernel == "numpy":
> SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T
> # More temperature polynomials
> RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T
> RC = c0 + (c1 + c2*T)*T
> result = SMOW + RB*S + RC*(S**1.5) + d0*S*S
> elif kernel == "numexpr":
> SMOW = "a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T"
> # More temperature polynomials
> RB = "b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T"
> RC = "c0 + (c1 + c2*T)*T"
> poly = "(%s) + (%s)*S + (%s)*(S**1.5) + (d0*S*S)" % (SMOW, RB, RC)
> result = ne.evaluate(poly)
> else:
> raise ValueError, "Unsupported kernel"
>
> return result
>
> S = np.random.rand(N)
> R = np.random.rand(N)
>
> print "Computing with NumPy:"
> t0 = time()
> for i in range(NTIMES):
> r1 = _dens0(S, R, "numpy")
> tnp = (time()-t0) / NTIMES
> print "time spent: %.4f" % tnp
>
> print "Computing with Numexpr with 10^%d elements:" % int(math.log10(N))
> for nthreads in range(1, ne.ncores+1):
> ne.set_num_threads(nthreads)
> t0 = time()
> for i in range(NTIMES):
> r2 = _dens0(S, R, "numexpr")
> tne = (time()-t0) / NTIMES
> print "time spent for %d threads: %.4f" % (nthreads, tne),
> print "speed-up: %.2f" % (tnp / tne)
>
> assert_array_almost_equal(r1, r2, 15, "Arrays are not equal")
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
--
Nicolau Werneck <nwerneck at gmail.com> C3CF E29F 5350 5DAA 3705
http://www.lti.pcs.usp.br/~nwerneck 7B9E D6C4 37BB DA64 6F15
Linux user #460716
"C++ is history repeated as tragedy. Java is history repeated as farce."
-- Scott McKay
More information about the SciPy-User
mailing list