[Numpy-discussion] optimizing power() for complex and real cases

Tim Hochberg tim.hochberg at cox.net
Fri Feb 17 13:19:00 EST 2006

David M. Cooke wrote:

>Tim Hochberg <tim.hochberg at cox.net> writes:
>>Here's a little progress report: I now have A**2 running as fast as
>>square(A). This is done by special casing stuff in array_power so that
>>A**2 acutally calls square(A) instead of going through power(A,2).
>>Things still need a bunch of cleaning up (in fact right now A**1
>>segfaults, but I know why and it should be an easy fix). However, I
>>think I've discovered why you couldn't get your special cased power to
>>run as fast for A**2 as square(A) or A*A. It appears that the overhead
>>of creating a new array object from the integer 2 is the bottleneck. I
>>was running into the same mysterious overhead, even when dispatching
>>from array_power, until I special cased on PyInt to avoid the array
>>creation in that case.
>Hmm, if that's true about the overhead, that'll hit all computations
>of the type op(x, <some scalar>). Something to look at. That ufunc code
>for casting the arguments is pretty big and hairy, so I'm not going to
>look at right now :-)
Well, it's just a guess based on the fact that the extra time went away 
when I stopped calling PyArray_EnsureArray(o2) for python ints. For what 
it's worth, numpy scalars seem to have much less overhead. As indicated 
below (note that numpy scalars are not currently special cased like 
PyInts are). The overhead from PyInts was closer to 75% versus about 15% 
for numpy scalars. Of course, the percentage of overhead is going to go 
up for smaller arrays.

 >>> Timer('a**2', 'from numpy import arange;a = arange(10000.); b = 
 >>> Timer('a**b', 'from numpy import arange;a = arange(10000.); b = 
 >>> Timer('a*a', 'from numpy import arange;a = arange(10000.); b = 
 >>> Timer('square(a)', 'from numpy import arange, square;a = 
arange(10000.); b = a[2]').timeit(10000)


More information about the NumPy-Discussion mailing list