David M. Cooke wrote:
Tim Hochberg <tim.hochberg@cox.net> writes:
I just checked in some changes that do aggressive optimization on the pow operator in numexpr. Now all integral and half integral powers between [-50 and 50] are computed using multiples and sqrt. (Empirically 50 seemed to be the closest round number to the breakeven point.)
I mention this primarily because I think it's cool. But also, it's the kind of optimization that I don't think would be feasible in numpy itself short of defining a whole pile of special cases, either separate ufuncs or separate loops within a single ufunc, one for each case that needed optimizing. Otherwise the bookkeeping overhead would overwhelm the savings of replacing pow with multiplies.
Now all of the bookkeeping is done in Python, which makes it easy; and done once ahead of time and translated into bytecode, which makes it fast. The actual code that does the optimization is included below for those of you interested enough to care, but not interested enough to check it out of the sandbox. It could be made simpler, but I jump through some hoops to avoid unnecessary mulitplies. For instance, starting 'r' as 'OpNode('ones_like', [a])' would simplify things signifigantly, but at the cost of adding an extra multiply in most cases.
That brings up an interesting idea. If 'mul' were made smarter, so that it recognized OpNode('ones_like', [a]) and ConstantNode(1), then not only would that speed some 'mul' cases up, it would simplify the code for 'pow' as well. I'll have to look into that tomorrow.
Instead of using a separate ones_like opcode, why don't you just add a ConstantNode(1) instead?
You think I can remember something like that a month or whatever later? You may be giving me too much credit ;-) Hmm..... Ok. I think I remember. IIRC, the reason is that ConstantNode(1) is slightly different than OpNode('ones_like', [a]) in some corner cases. Specifically pow(a**0) will produce an array of ones in one case, but a scalar 1 in the other case. -tim