strange exponentiation performance

Sun Nov 23 14:45:36 EST 2003

[Jeff Davis]
> I was doing some thinking about exponentiation algorithms along with a
> friend, and I happened upon some interesting results. Particularly, I
> was able to outperform the ** operator in at least one case, with a
> recursive algorithm. This leads me to believe that perhaps the **
> operator should tune it's algorithm based on inputs or some such
> thing. Here is my data:

I'm taking the liberty of rewriting your functions to stop trying to squash
as much as they can on each line.  It's important to do that if you want to
modify the code quickly when experimenting.

First you rediscovered the "left to right" binary exponentiation algorithm:

def h(b, e):
    if e == 0:
        return 1
    if e == 1:
        return b
    t = h(b, e >> 1)
    return t*t * h(b, e & 1)

When you tried to remove recursion from it, you rediscovered the "right to
left" binary exponentiation algorithm:

def f(b, e):
    n = 1
    while e > 0:
       if e & 1:
           n *= b
       e >>= 1
       b *= b
    return n

That's close to what Python's builtin ** does, but is missing an important
speed trick (explained later).  Note that h and f aren't really the *same*
algorithm!  f squares the current value of b on every trip through the loop.
h does not.  If you print out all the inputs to all the multiplications,
you'll see that they're not at all the same.

Finally (I'll skip g(), as it isn't interesting):

def o(b, e):
    return b**e

> ...
> now, g() was blown out of the water, as expected, but the others were
> close enough for another test at a higher "e" value.
>
> >>> test(f,19,500000)
> 8.02366995811
> >>> test(h,19,500000)
> 3.66968798637
> >>> test(o,19,500000)
> 5.29517292976
> >>>

Your box is faster than mine.  Here under 2.3.2 on Windows:

f 14.6648669941
h  6.53576912117
o  9.59630399437

> Now, that is the interesting part. How did ** not measure up to h()?

The left-to-right exponentiation algorithm can be faster than the
right-to-left one.  The former is clumsier to code in C, though, and nobody
ever bothered to endure that pain in Python's implementation.  If you search
the archives hard enough, you'll eventually find a thread about this from
Christian Tismer.

Neither algorithm is optimal for all inputs, BTW.  Turns out optimal
exponentiation is a very hard problem; Knuth Volume 2 has an extensive
discussion of this ("Evaluation of Powers"); another algorithm based on
first factoring the exponent (expressing it as a product of primes) is
better "on average" than either binary method, but sometimes loses to them.

> It's also interesting that f(), which is supposed to be a more
> efficient version of h(), is lagging behind.

The biggest trick it's missing is that it *always* does

       b *= b

even on the last trip through the loop.  The value of b it computes on the
last trip is never used, and b has grown very large by that time so
computing b*b is *very* expensive then.  Change f like so:

def f(b, e):
    n = 1
    while e > 0:
       if e & 1:
           n *= b
       e >>= 1
       if e:  # new guard:  only square b if the result will be used
           b *= b
    return n

and then it's essentially identical to the builtin **:

f  9.5959586986
h  6.54231046447
o  9.59677081413

> I would like help explaining the following:
> (1) How did my less-than-perfectly-optimized recursive algorithm win
> against the ** operator?

It used left-to-right instead of right-to-left, which on this pair of inputs
is a better approach.

> (2) How can I unwrap and optimize h()?

Make it work left-to-right instead of right-to-left.  That's tedious to do.
You can't get the same sequence of multiplication inputs by going
right-to-left, so the code will have to be more complicated.  Start by
finding the highest bit set in e.  For example,

def q(b, e):
    if e == 0:
        return 1
    if e == 1:
        return b
    e2, numbits = e, 0
    while e2:
        e2 >>= 1
        numbits += 1
    assert numbits >= 2
    result = b
    mask = 1L << (numbits - 2)
    for dummy in range(numbits - 1):
        result *= result
        if e & mask:
            result *= b
        mask >>= 1
    return result

That's bound to be a little faster than h on most inputs, because it also
optimizes away multiplications by 1 (well, unless b happens to be 1).  Let's
see:

f 9.53526793946
o 9.53408287098
h 6.54592768903
q 6.11897031462

Yup, those matter, but not a whale of a lot.  The savings in skipping
multiplications by 1 is proportional to the number of 0 bits in the
exponent.  What *didn't* matter is whether it's recursive or iterative.

> From what I understand,recursion is never supposed to be the most
> efficient. I suspect there are some hidden inefficiencies in using
> while(), but I'd like to know the specifics.

Na, none of that matters.  All these algorithms do a number of
multiplications proportional to log(e, 2), which is insignificantly tiny
compared to the magnitude of the result.  Long-int multiplication is the
only thing that consumes significant time here, so the only thing that
matters to the time is the exact sequence of inputs fed to *.  Even whether
you write the *driver* in Python or C or assembly language is insignificant
compared to that.

> If my algorithm h() is better, why can't ** use a quick test to change
> algorithms based on inputs? Or is mine better in all cases?

See Knuth <wink>.

> ...
> Also note that I'm not exactly an algorithms expert, I just happened
> upon these results while chatting with a friend.

You did good!  The first couple things you tried only nick the surface of
what's known about this problem.  You *could* spend a year learning
everything that's known about it.  In the end, if you implemented all that,
you could easily end up with 1000x more code, which sometimes ran faster.
To become an algorithm expert, you should do that <wink>.  In real life, a
crucial complementary skill is learning when to say "good enough", and move
on.  That's indeed why Python's implementation settled for the
easy-to-code-in-C right-to-left binary exponentiation method.  If that part
of Python had been written in Python instead, I would have used the
easy-to-code-in-Python recursive left-to-right method instead.  Sticking to
easy-to-code methods also has good influence on minimizing the # of bugs, of
course.