numpy and math sqrt timings
On the VPython list Scott Daniels suggested using try/except to deal with the problem of sqrt(5.5) being numpy.float64 and thereby making sqrt(5.5)*(VPython vector) not a (VPython vector), which ends up as a big performance hit on existing programs. I tried his suggestion and did some timing using the program shown below. Using "from numpy import *", the numpy sqrt(5.5) gives 5.7 microsec per sqrt, whereas using "from math import *" a sqrt is only 0.8 microsec. Why is numpy so much slower than math on this simple case? For completeness I also timed the old Numeric sqrt, which was 14 microsec, so numpy is a big improvement, but still very slow compared to math. Using Daniels's suggestion of first trying the math sqrt, falling through to the numpy sqrt only if the argument isn't a simple scalar, gives 1.3 microsec per sqrt on the simple case of a scalar argument. Shouldn't/couldn't numpy do something like this internally? Bruce Sherwood ---------------------------- from math import * mathsqrt = sqrt from numpy import * numpysqrt = sqrt from time import clock # 0.8 microsec for "raw" math sqrt # 5.7 microsec for "raw" numpy sqrt # 1.3 microsec if we try math sqrt first def sqrt(x): try: return mathsqrt(x) except TypeError: return numpysqrt(x) # Check that numpy sqrt is invoked on an array: nums = array([1,2,3]) print sqrt(nums) x = 5.5 N = 500000 t1 = clock() for n in range(N): y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) t2 = clock() for n in range(N): pass t3 = clock() # t3-t2 is the loop overhead (turns out negligible) print "%i loops over 10 sqrt's takes %.1f seconds" % (N,t2-t1) print "Total loop overhead = %.2f seconds (negligible)" % (t3-t2) print "One sqrt takes %.1f microseconds" % (1e6*((t2-t1)-(t3-t2))/(10*N))
Hi, This is a know fact, you should use Python default functions if you have only one value. If Numpy uses math.sqrt for floatting point number, it would have to use cmath for complex values as well. Now, I don't know if an additionnal test will slow down Numpy, if this is the case, then we should stay with the current situation ; if I have a signle value to compute, I always use math instead of Numpy. Matthieu 2007/12/29, Bruce Sherwood <Bruce_Sherwood@ncsu.edu>:
On the VPython list Scott Daniels suggested using try/except to deal with the problem of sqrt(5.5) being numpy.float64 and thereby making sqrt(5.5)*(VPython vector) not a (VPython vector), which ends up as a big performance hit on existing programs. I tried his suggestion and did some timing using the program shown below.
Using "from numpy import *", the numpy sqrt(5.5) gives 5.7 microsec per sqrt, whereas using "from math import *" a sqrt is only 0.8 microsec. Why is numpy so much slower than math on this simple case? For completeness I also timed the old Numeric sqrt, which was 14 microsec, so numpy is a big improvement, but still very slow compared to math.
Using Daniels's suggestion of first trying the math sqrt, falling through to the numpy sqrt only if the argument isn't a simple scalar, gives 1.3 microsec per sqrt on the simple case of a scalar argument. Shouldn't/couldn't numpy do something like this internally?
Bruce Sherwood
---------------------------- from math import * mathsqrt = sqrt from numpy import * numpysqrt = sqrt from time import clock
# 0.8 microsec for "raw" math sqrt # 5.7 microsec for "raw" numpy sqrt # 1.3 microsec if we try math sqrt first
def sqrt(x): try: return mathsqrt(x) except TypeError: return numpysqrt(x)
# Check that numpy sqrt is invoked on an array: nums = array([1,2,3]) print sqrt(nums)
x = 5.5 N = 500000 t1 = clock() for n in range(N): y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) t2 = clock() for n in range(N): pass t3 = clock() # t3-t2 is the loop overhead (turns out negligible) print "%i loops over 10 sqrt's takes %.1f seconds" % (N,t2-t1) print "Total loop overhead = %.2f seconds (negligible)" % (t3-t2) print "One sqrt takes %.1f microseconds" % (1e6*((t2-t1)-(t3-t2))/(10*N))
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
participants (2)
-
Bruce Sherwood
-
Matthieu Brucher