[Numpy-discussion] numpy.vectorize performance

Fri Jul 14 12:43:38 EDT 2006

On Jul 13, 2006, at 10:17 PM, Tim Hochberg wrote:

> Nick Fotopoulos wrote:
>> Dear all,
>>
>> I often make use of numpy.vectorize to make programs read more  
>> like  the physics equations I write on paper.  numpy.vectorize is  
>> basically  a wrapper for numpy.frompyfunc.  Reading Travis's Scipy  
>> Book (mine is  dated Jan 6 2005) kind of suggests to me that it  
>> returns a full- fledged ufunc exactly like built-in ufuncs.
>>
>> First, is this true?
> Well according to type(), the result of frompyfunc is indeed of  
> type ufunc, so I would say the answer to that is yes.
>> Second, how is the performance?
> A little timing indicates that it's not good (about 30 X slower for  
> computing x**2 than doing it using x*x on an array). . That's not  
> frompyfunc (or vectorizes) fault though. It's calling a python  
> function at each point, so the python function call overhead is  
> going to kill you. Not to mention instantiating an actual Python  
> object or objects at each point.

That's unfortunate since I tend to nest functions quite deeply and  
then scipy.integrate.quad over them, which I'm sure results in a  
ridiculous number of function calls.  Are anonymous lambdas any  
different than named functions in terms of performance?

>
>> i.e., are my  functions performing approximately as fast as they  
>> could be or would  they still gain a great deal of speed by  
>> rewriting it in C or some  other compiled python accelerator?
>>
> Can you give examples of what these functions look like? You might  
> gain a great deal of speed by rewriting them in numpy in the  
> correct way. Or perhaps not, but it's probably worth showing some  
> examples so we can offer suggestions or at least admit that we are  
> stumped.

This is by far the slowest bit of my code.  I cache the results, so  
it's not too bad, but any upstream tweak can take a lot of CPU time  
to propagate.

@autovectorized
def dnsratezfunc(z):
     """Take coalescence time into account.""
     def integrand(zf):
         return Pz(z,zf)*NSbirthzfunc(zf)
     return quad(integrand,delayedZ(2e5*secperyear+1,z),5)[0]
dnsratez = lambdap*dnsratezfunc(zs)

where:

# Neutron star formation rate is a delayed version of star formation  
rate
NSbirthzfunc = autovectorized(lambda z: SFRz(delayedZ 
(1e8*secperyear,z)))

def Pz(z_c,z_f):
     """Return the probability density per unit redshift of a DNS
     coalescence at z_c given a progenitor formation at z_f. """
     return P(t(z_c,z_f))*dtdz(z_c)

and there are many further nested levels of function calls.  If the  
act of calling a function is more expensive than actually executing  
it and I value speed over readability/code reuse, I can inline Pz's  
function calls and inline the unvectorized NSbirthzfunc to reduce the  
calling stack a bit.  Any other suggestions?

Thanks, Tim.

Take care,
Nick