[Numpy-discussion] Re: numexpr: optimizing pow

May 4, 2006

      David M. Cooke wrote:
...
Tim Hochberg <tim.hochberg@cox.net> writes:
...
I just checked in some changes that do aggressive optimization on the
pow operator in numexpr. Now all integral and half integral powers
between [-50 and 50] are computed using multiples and sqrt.
(Empirically 50 seemed to be the closest round number to the breakeven
point.)
I mention this primarily because I think it's cool. But also, it's the
kind of optimization that I don't think would be feasible in numpy
itself short of defining a whole pile of special cases, either
separate ufuncs or separate loops within a single ufunc, one for each
case that needed optimizing. Otherwise the bookkeeping overhead would
overwhelm the savings of replacing pow with multiplies.
Now all of the bookkeeping is done in Python, which makes it easy; and
done once ahead of time and translated into bytecode, which makes it
fast. The actual code that does the optimization is included below for
those of you interested enough to care, but not interested enough to
check it out of the sandbox. It could be made simpler, but I jump
through some hoops to avoid unnecessary mulitplies. For instance,
starting 'r' as 'OpNode('ones_like', [a])' would simplify things
signifigantly, but at the cost of adding an extra multiply in most
cases.
That brings up an interesting idea. If 'mul' were made smarter, so
that it recognized OpNode('ones_like', [a]) and ConstantNode(1), then
not only would that speed some 'mul' cases up, it would simplify the
code for 'pow' as well. I'll have to look into that tomorrow.
Instead of using a separate ones_like opcode, why don't you just add a
ConstantNode(1) instead?
You think I can remember something like that a month or whatever later? 
You may be giving me too much credit ;-)

Hmm.....

Ok. I think I remember. IIRC, the reason is that ConstantNode(1) is 
slightly different than OpNode('ones_like', [a]) in some corner cases. 
Specifically pow(a**0) will produce an array of ones in one case, but a 
scalar 1 in the other case.

-tim