On the VPython list Scott Daniels suggested using try/except to deal with the problem of sqrt(5.5) being numpy.float64 and thereby making sqrt(5.5)*(VPython vector) not a (VPython vector), which ends up as a big performance hit on existing programs. I tried his suggestion and did some timing using the program shown below. Using "from numpy import *", the numpy sqrt(5.5) gives 5.7 microsec per sqrt, whereas using "from math import *" a sqrt is only 0.8 microsec. Why is numpy so much slower than math on this simple case? For completeness I also timed the old Numeric sqrt, which was 14 microsec, so numpy is a big improvement, but still very slow compared to math. Using Daniels's suggestion of first trying the math sqrt, falling through to the numpy sqrt only if the argument isn't a simple scalar, gives 1.3 microsec per sqrt on the simple case of a scalar argument. Shouldn't/couldn't numpy do something like this internally? Bruce Sherwood ---------------------------- from math import * mathsqrt = sqrt from numpy import * numpysqrt = sqrt from time import clock # 0.8 microsec for "raw" math sqrt # 5.7 microsec for "raw" numpy sqrt # 1.3 microsec if we try math sqrt first def sqrt(x): try: return mathsqrt(x) except TypeError: return numpysqrt(x) # Check that numpy sqrt is invoked on an array: nums = array([1,2,3]) print sqrt(nums) x = 5.5 N = 500000 t1 = clock() for n in range(N): y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) y = sqrt(x) t2 = clock() for n in range(N): pass t3 = clock() # t3-t2 is the loop overhead (turns out negligible) print "%i loops over 10 sqrt's takes %.1f seconds" % (N,t2-t1) print "Total loop overhead = %.2f seconds (negligible)" % (t3-t2) print "One sqrt takes %.1f microseconds" % (1e6*((t2-t1)-(t3-t2))/(10*N))