[Tutor] A little math and Python: Horner's rule [horner's rule / optimization / C extension]

Tue, 20 Aug 2002 00:45:29 -0700 (PDT)

[warning: this is just recreational programming: there's nothing of real
value in here.  *grin* And it's long, so skip when you start yawning.]

Hi everyone,

I just wanted to share a little Python session I had this evening:

There's a very cute math trick involved when one wants to calculate
polynomials fairly quickly.  For example, let's say we'd like to calculate
some 5th degree polynomial along several points.  Let's see...

###
>>> coeffs = [random.randrange(-42, 42) for i in range(6)]
>>> coeffs
[-35, -12, -40, -39, -7, -22]
###

How random that all those numbers turned up negative!

Anyway, let's say that we'd like to write a small program to draw out the
a curve for the function:

    f(x) = -35 - 12x - 40x^2 - 39x^3 - 7x^4 - 27x^5

Doing this by hand would be tedious.  But we can write this as a Python
function to do the work for us:

###
>>> def f(x):
...     return -35 - 12*x - 40*x**2 - 39*x**3 - 7*x**4 - 27*x**5
...
>>> f(0)
-35
>>> f(0), f(1), f(1.5), f(2)
(-35, -160, -515.09375, -1507)
###

Ok, cool.  That wasn't too difficult.  Problem solved.

Or is it?  One problem we often face is that our programs may be doing too
much work.  This f() function is doing an awful amount of multiplication,
and multiplication can be expensive if we do it a lot.  (Matrix
multiplication, for example, is at the heart of 3d graphics algorithms,
and 3d programmers try to reduce multiplication to a bare minimum if they
can.)

If we've told Python to calculate x**5, it probably needs to calculate
intermediary values, like x**2 and x**3.  In all, it looks like we're
doing 5+4+3+2+1 == 15 multiplications.  It might be good if we could use
those intermediate powers of x and reduce the amount of work the poor
computer has to do... and it turns out that we can!

###
>>> def f2(x):
...     return((((-27 * x - 7) * x - 39) * x - 40) * x - 12) * x - 35
...
>>> f2(0), f2(1), f2(1.5), f2(2)
(-35, -160, -515.09375, -1507)
###

This trick is called "Horner's Rule", and it nests the multiplications
within each other to make ever multiplication count.

But is it worth it to make this look so hideous?  Look at all those
parentheses!  How much faster is f2(), really, compared to f()?

###
>>> sample_points = [x/100.0 for x in range(-10000, 10000)]
>>> import time
>>> def timing_test(function):
...     start_time = time.time()
...     for point in sample_points:
...         function(point)
...     stop_type = time.time()
...     return stop_type - start_time
...
>>> timing_test(f)
0.099812984466552734
>>> timing_test(f2)
0.052543997764587402
###

Bah!  That wasn't worth it at all!  That's less than half a second: who
cares about that?

But what if we up the number of points a bit... say...

###
>>> sample_points = [x/100.0 for x in range(-1000000, 1000000)]
>>> timing_test(f)
10.047876000404358
>>> timing_test(f2)
5.4270470142364502
>>> sample_points = [x/100.0 for x in range(-10000000, 10000000)]
>>> timing_test(f)
120.40663707256317
>>> timing_test(f2)
53.546291947364807
###

That's better.  So there is a savings involved, but we have to really push
the number of f(x)'s hard before it really becomes economical to know
Horner's rule.

Out of curiosity, I wanted to see if there was significant improvement if
I recoded the horner algorithm in C.  I've written an extention module
called 'hornerc' which takes in a list of coefficients, as well as the 'x'
where we'd like to evaluate the function:

###
>>> import hornerc
>>> def f3(x):
...     return hornerc.horner([-35, -12, -40, -39, -7, -27], x)
...
>>> f3(0), f3(1), f3(1.5), f3(2)
(-35, -160, -515.09375, -1507)
>>> timing_test(f3)
102.27121210098267
###

??!

Wow, that wasn't an improvement at all.  In fact, it turned out _worse_
than my Python implementation.  Heck, it turned out almost as bad as if I
hadn't done that Horner optimization in the first place!  What gives?
Hmmm... let me check something:

###
>>> horner = hornerc.horner
>>> l = [-35, -12, -40, -39, -7, -27]
>>> def f4(x):
...     return horner(l, x)
...
>>> f4(0), f4(1), f4(1.5), f4(2)
(-35, -160, -515.09375, -1507)
>>> timing_test(f4)
64.283614039421082
###

One technique we can use to speed up a function is to reduce the amount of
work it does.  *grin* In Python, looking up a module function does take
some work, as does list construction, so I've pulled both of these outside
of the f4() function.

f4() is an improvement, but, surprisingly, it's still a bit slower than
the native Python implementation f2().  Very odd!  I'll have to glare
confusedly at my C code and see where I'm doing something stupid.

I've placed my 'hornerc' extension here for intrepid souls:

    http://hkn.eecs.berkeley.edu/~dyoo/python/horner/

One lesson I'm learning from this is that coding in C is not magic fairy
dust --- It's possible that just recoding an algorithm in C may even slow
things down if we're not careful!  Premature optimization is the root of
all evil.

Still, that was fun.  Forgive me for rambling about Horner's rule; one
thing just kept leading to another.  But I hope this was interesting to
someone.  *grin*