[Numpy-discussion] Re: indexing problem

Mon Feb 13 21:46:02 EST 2006

Tim Hochberg <tim.hochberg at cox.net> writes:

> David M. Cooke wrote:
>
>>Gary Ruben <gruben at bigpond.net.au> writes:
>>
>>
>>
>>>Tim Hochberg wrote:
>>><snip>
>>>
>>>
>>>>However, I'm not convinced this is a good idea for numpy. This would
>>>>introduce a discontinuity in a**b that could cause problems in some
>>>>cases. If, for instance, one were running an iterative solver of
>>>>some sort (something I've been known to do), and b was a free
>>>>variable, it could get stuck at b = 2 since things would go
>>>>nonmonotonic there.
>>>>
>>>>
>>>I don't quite understand the problem here. Tim says Python special
>>>cases integer powers but then talks about the problem when b is a
>>>floating type. I think special casing x**2 and maybe even x**3 when
>>>the power is an integer is still a good idea.
>>>
>>>
>>
>>Well, what I had done with Numeric did special case x**0, x**1,
>>x**(-1), x**0.5, x**2, x**3, x**4, and x**5, and only when the
>>exponent was a scalar (so x**y where y was an array wouldn't be). I
>>think this is very useful, as I don't want to microoptimize my code to
>>x*x instead of x**2. The reason for just scalar exponents was so
>>choosing how to do the power was lifted out of the inner loop. With
>>that, x**2 was as fast as x*x.
>>
>>
> This is getting harder to object to since, try as I might I can't get
> a**b to go nonmontonic in the vicinity of b==2. I run out of floating
> point resolution before the slight shift due to special casing at 2
> results in nonmonoticity. I suspect that I could manage it with enough
> work, but it would require some unlikely function of a**b. I'm not
> sure if I'm really on board with this, but let me float a slightly
> modified proposal anyway:
>
>    1. numpy.power stays as it is now. That way in the rare case that
> someone runs into trouble they can drop back to power. Alternatively
> there could be rawpower and power where rawpower has the current
> behaviour. While the name rawpower sounds cool/cheesy, power is used
> infrequently enough that I doubt it matters whether it has these
> special case optimazations.

+1

>
>   2, Don't distinguish between scalars and arrays -- that just makes
> things harder to explain.

Makes the optimizations better, though.

>   3. Python itself special cases all integral powers between -100 and
> 100. Beg/borrow/steal their code. This makes it easier to explain
> since all smallish integer powers are just automagically faster.
>
>   4. Is the performance advantage of special casing a**0.5
> signifigant? If so use the above trick to special case all half
> integral  and integral powers between -N and N. Since sqrt probably
> chews up some time the cutoff. The cutoff probably shifts somewhat if
> we're optimizing half integral as well as integral powers. Perhaps N
> would be 32 or 64.
>
> The net result of this is that a**b would be computed using a
> combination of repeated multiplication and sqrt for  real integral and
> half integral values of b between -N and N. That seems simpler to
> explain and somewhat more useful as well.
>
> It sounds like a fun project although I'm not certain yet that it's a
> good idea.

Basically, my Numeric code looked like this:

#define POWER_UFUNC3(prefix, basetype, exptype, outtype)                \
static void prefix##_power(char **args, int *dimensions,                \
                           int *steps, void *func) {                    \
    int i, cis1=steps[0], cis2=steps[1], cos=steps[2], n=dimensions[0]; \
    int is1=cis1/sizeof(basetype);                                      \
    int is2=cis2/sizeof(exptype);                                       \
    int os=cos/sizeof(outtype);                                         \
    basetype *i1 = (basetype *)(args[0]);                               \
    exptype *i2=(exptype *)(args[1]);                                   \
    outtype *op=(outtype *)(args[2]);                                   \
    if (is2 == 0) {                                                     \
        exptype exponent = i2[0];                                       \
        if (POWER_equal(exponent, 0.0)) {                               \
            for (i = 0; i < n; i++, op += os) {                         \
                POWER_one((*op))                                        \
            }                                                           \
        } else if (POWER_equal(exponent, 1.0)) {                        \
            for (i = 0; i < n; i++, i1 += is1, op += os) {              \
                *op = *i1;                                              \
            }                                                           \
        } else if (POWER_equal(exponent, 2.0)) {                        \
            for (i = 0; i < n; i++, i1 += is1, op += os) {              \
                POWER_square((*op),(*i1))                               \
            }                                                           \
        } else if (POWER_equal(exponent, -1.0)) {                       \
            for (i = 0; i < n; i++, i1 += is1, op += os) {              \
                POWER_inverse((*op),(*i1))                              \
            }                                                           \
        } else if (POWER_equal(exponent, 3.0)) {                        \
            for (i = 0; i < n; i++, i1 += is1, op += os) {              \
                POWER_cube((*op),(*i1))                                 \
            }                                                           \
        } else if (POWER_equal(exponent, 4.0)) {                        \
            for (i = 0; i < n; i++, i1 += is1, op += os) {              \
                POWER_fourth((*op),(*i1))                               \
            }                                                           \
        } else if (POWER_equal(exponent, 0.5)) {                        \
            for (i = 0; i < n; i++, i1 += is1, op += os) {              \
                POWER_sqrt((*op),(*i1))                                 \
            }                                                           \
        } else {                                                        \
            for (i = 0; i < n; i++, i1 += is1, op += os) {              \
                POWER_pow((*op), (*i1), (exponent))                     \
            }                                                           \
        }                                                               \
    } else {                                                            \
        for (i=0; i<n; i++, i1+=is1, i2+=is2, op+=os) {                 \
            POWER_pow((*op), (*i1), (*i2))                              \
        }                                                               \
    }                                                                   \
}
#define POWER_UFUNC(prefix, type) POWER_UFUNC3(prefix, type, type, type)

#define FTYPE float
#define POWER_equal(x,y)   x == y
#define POWER_one(o)       o = 1.0;
#define POWER_square(o,x)  o = x*x;
#define POWER_inverse(o,x) o = 1.0 / x;
#define POWER_cube(o,x)    FTYPE y=x; o = y*y*y;
#define POWER_fourth(o,x)  FTYPE y=x, s = y*y; o = s * s;
#define POWER_sqrt(o,x)    o = sqrt(x);
#define POWER_pow(o,x,n)   o = pow(x, n);
POWER_UFUNC(FLOAT, float)
POWER_UFUNC3(FLOATD, float, double, float)

plus similiar definitions for float, double, complex float, and
complex double. Using the POWER_square, etc. macros means the complex
case was easy to add.

The speed comes from the inlining of how to do the power _outside_ the
inner loop. The reason x**2, etc. are slower currently is there is a
function call in the inner loop. Your's and mine C library's pow()
function mostly likely does something like I have above, for a single
case: pow(x, 2.0) is calculated as x*x. However, each time through it
has do decide _how_ to do it.

That's why I limited the optimization to scalar exponents: array
exponents would mean it's about as slow as the pow() call, even if the
checks were inlined into the loop. It would probably be even slower
for the non-optimized case, as you'd check for the special exponents,
then call pow() if it fails (which would likely recheck the exponents).

Maybe a simple way to add this is to rewrite x.__pow__() as something
like the C equivalent of

def __pow__(self, p):
    if p is not a scalar:
       return power(self, p)
    elif p == 1:
       return p
    elif p == 2:
       return square(self)
    elif p == 3:
       return cube(self)
    elif p == 4:
       return power_4(self)
    elif p == 0:
       return ones(self.shape, dtype=self.dtype)
    elif p == -1:
       return 1.0/self
    elif p == 0.5:
       return sqrt(self)

and add ufuncs square, cube, power_4 (etc.).

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca