[Numpy-discussion] optimizing power() for complex and real cases

Thu Feb 23 16:15:01 EST 2006

Tim Hochberg <tim.hochberg at cox.net> writes:

> David M. Cooke wrote:
>
>>Tim Hochberg <tim.hochberg at cox.net> writes:
>>
>>>David M. Cooke wrote:
>>>
>>>
> [SNIP]
>
>>
>>I've gone through your code you checked in, and fixed it up. Looks
>>good. One side effect is that
>>
>>def zl(x):
>>    a = ones_like(x)
>>    a[:] = 0
>>    return a
>>
>>is now faster than zeros_like(x) :-)
>>
>>
> I noticed that ones_like was faster than zeros_like, but I didn't
> think to try that. That's pretty impressive considering how
> ridicuously easy it was to write.

Might be useful to move zeros_like and empty_like to ufuncs :-) (Well,
they'd be better written as regular C functions, though.)

>>One problem I had is that in PyArray_SetNumericOps, the "copy" method
>>wasn't picked up on. It may be due to the order of initialization of
>>the ndarray type, or something (since "copy" isn't a ufunc, it's
>>initialized in a different place). I couldn't figure out how to fiddle
>>that, so I replaced the x.copy() call with a call to PyArray_Copy().
>>
>>
> Interesting. It worked fine here.

Actually, it works fine in the sense that it works. However, if you
time it, it was obvious that it wasn't using an optimized version
(x**1 was as slow as x**1.1).

> I kind of like power and scalar_power. Then ** could be advertised as
> calling scalar_power for scalars and power for arrays. Scalar power
> would do optimizations on integer and half_integer powers. Of course
> there's no real way to enforce that scalar power is passed scalars,
> since presumably it would be a ufunc, short of making _scalar_power a
> ufunc instead and doing something like:
>
> def scalar_power(x, y):
>    "compute x**y, where y is a scalar optimizing integer and half
> integer powers possibly at some minor loss of accuracy"
>    if not is_scalar(y): raise ValuerError("Naughty!!")
>    return _scalar_power(x,y)

I'm tempted to make it have the same signature as power, but call
power if passed an array (or, at the ufunc level, if the stride for
the second argument is non-zero).

>>Another point is to look at __div__, and use reciprocal if the
>>dividend is 1.
>>
>>
> That would be easy, but wouldn't it be just as easy to optimize
> __div__ for scalar divisions. Should probably check that this isn't
> just as fast since it would be a lot more general.

Hmm, scalar division and multiplication could both be speed up:

In [36]: a = arange(10000, dtype=float)
In [37]: %time for i in xrange(100000): a * 1.0
CPU times: user 3.30 s, sys: 0.00 s, total: 3.30 s
Wall time: 3.39
In [38]: b = array([1.])
In [39]: %time for i in xrange(100000): a * b
CPU times: user 2.63 s, sys: 0.00 s, total: 2.63 s
Wall time: 2.71

The differences in times are probably due to creating an array to hold
1.0.

When I have time, I'll look at the ufunc machinery. Since ufuncs are
just passed pointers to data and strides, there's no reason (besides
increasing complexity ;-) to build an ndarray object for scalars.

Alternatively, allow passing scalars to ufuncs: you could define a
ufunc (like our scalar_power) to take an array argument and a scalar
argument. Or, power could be defined to take (array, array) or (array,
scalar), and the ufunc machinery would choose the appropiate one.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca