[Numpy-discussion] numpy.vectorize performance

Fri Jul 14 12:56:32 EDT 2006

Nick Fotopoulos wrote:
> On Jul 13, 2006, at 10:17 PM, Tim Hochberg wrote:
>
>   
>> Nick Fotopoulos wrote:
>>     
>>> Dear all,
>>>
>>> I often make use of numpy.vectorize to make programs read more  
>>> like  the physics equations I write on paper.  numpy.vectorize is  
>>> basically  a wrapper for numpy.frompyfunc.  Reading Travis's Scipy  
>>> Book (mine is  dated Jan 6 2005) kind of suggests to me that it  
>>> returns a full- fledged ufunc exactly like built-in ufuncs.
>>>
>>> First, is this true?
>>>       
>> Well according to type(), the result of frompyfunc is indeed of  
>> type ufunc, so I would say the answer to that is yes.
>>     
>>> Second, how is the performance?
>>>       
>> A little timing indicates that it's not good (about 30 X slower for  
>> computing x**2 than doing it using x*x on an array). . That's not  
>> frompyfunc (or vectorizes) fault though. It's calling a python  
>> function at each point, so the python function call overhead is  
>> going to kill you. Not to mention instantiating an actual Python  
>> object or objects at each point.
>>     
>
> That's unfortunate since I tend to nest functions quite deeply and  
> then scipy.integrate.quad over them, which I'm sure results in a  
> ridiculous number of function calls.  Are anonymous lambdas any  
> different than named functions in terms of performance?
>   
Sorry, no. Under the covers they're the same.

>>> i.e., are my  functions performing approximately as fast as they  
>>> could be or would  they still gain a great deal of speed by  
>>> rewriting it in C or some  other compiled python accelerator?
>>>
>>>       
>> Can you give examples of what these functions look like? You might  
>> gain a great deal of speed by rewriting them in numpy in the  
>> correct way. Or perhaps not, but it's probably worth showing some  
>> examples so we can offer suggestions or at least admit that we are  
>> stumped.
>>     
>
> This is by far the slowest bit of my code.  I cache the results, so  
> it's not too bad, but any upstream tweak can take a lot of CPU time  
> to propagate.
>
> @autovectorized
> def dnsratezfunc(z):
>      """Take coalescence time into account.""
>      def integrand(zf):
>          return Pz(z,zf)*NSbirthzfunc(zf)
>      return quad(integrand,delayedZ(2e5*secperyear+1,z),5)[0]
> dnsratez = lambdap*dnsratezfunc(zs)
>
> where:
>
> # Neutron star formation rate is a delayed version of star formation  
> rate
> NSbirthzfunc = autovectorized(lambda z: SFRz(delayedZ 
> (1e8*secperyear,z)))
>
> def Pz(z_c,z_f):
>      """Return the probability density per unit redshift of a DNS
>      coalescence at z_c given a progenitor formation at z_f. """
>      return P(t(z_c,z_f))*dtdz(z_c)
>
> and there are many further nested levels of function calls.  If the  
> act of calling a function is more expensive than actually executing  
> it and I value speed over readability/code reuse, I can inline Pz's  
> function calls and inline the unvectorized NSbirthzfunc to reduce the  
> calling stack a bit.  Any other suggestions?
>   
I think I'd try psyco (http://psyco.sourceforge.net/). That's pretty 
painless to try and may result in a significant improvement.

-tim