[Numpy-discussion] optimizing power() for complex and real cases

Sat Feb 18 17:19:03 EST 2006

OK, I now have a faily clean implementation in C of:

def __pow__(self, p):
    if p is not a scalar:
       return power(self, p)
    elif p == 1:
       return p
    elif p == 2:
       return square(self)
#    elif p == 3:
#       return cube(self)
#    elif p == 4:
#       return power_4(self)
#    elif p == 0:
#       return ones(self.shape, dtype=self.dtype)
#    elif p == -1:
#       return 1.0/self
    elif p == 0.5:
       return sqrt(self)

First a couple of technical questions, then on to the philosophical portion of this note.

1. Is there a nice fast way to get a matrix filled with ones from C. I've been tempted to write a ufunc 'ones_like', but I'm afraid that might be considered inappropriate.

2. Are people aware that array_power is sometimes passed non arrays as its first argument? Despite having the signature:

array_power(PyArrayObject *a1, PyObject *o2) 

This caused me almost no end of headaches, not to mention crashes during numpy.test().

I'll check this into the power_optimization branch RSN, hopefully with a fix for the zero power case. Possibly also after extending it to inplace power as well.

OK, now on to more important stuff. As I've been playing with this my opinion has gone in circles a couple of times. I now think the issue of optimizing integer powers of complex numbers and integer powers of floats are almost completely different. Because complex powers are quite slow and relatively inaccurate, it is appropriate to optimize them for integer powers at the level of nc_pow. This should be just a matter of liberal borrowing from complexobject.c, but I haven't tried it yet.

On the other hand, real powers are fast enough that doing anything at the single element level is unlikely to help. So in that case we're left with either optimizing the cases where the dimension is zero as David has done, or optimizing at the __pow__ (AKA array_power) level as I've done now based on David's original suggestion. This second approach is faster because it avoids the mysterious python scalar -> zero-D array conversion overhead. However, it suffers if we want to optimize lots of different powers since one needs a ufunc for each one. So the question becomes, which powers should we optimize?

My latest thinking on this is that we should optimize only those cases where the optimized result is no less accurate than that produced by pow. I'm going to assume that all C operations are equivalently accurate, so pow(x,2) has roughly the same amount of error as x*x. (Something on the order of 0.5 ULP I'd guess). In that case:
   pow(x, -1) -> 1 / x
   pow(x, 0) -> 1
   pow(x, 0.5) -> sqrt(x)
   pow(x, 1) -> x
   pow(x, 2) -> x*x
can all be implemented in terms of multiply or divide with the same accuracy as the original power methods. Once we get beyond these, the error will go up progressively.

The minimal set described above seems like it should be relatively uncontroversial and it's what I favor. Once we get beyond this basic set, we would need to reach some sort of consensus on how much additional error we are willing to tolerate for optimizing these extra cases. You'll notice that I've changed my mind, yet again, over whether to optimize A**0.5. Since the set of additional ufuncs needed in this case is relatively small, just square and inverse (==1/x), this minimal set works well if optimizing in pow as I've done.

That's the state of my thinking on this at this exact moment. I'd appreciate any comments and suggestions you might have.