[Edu-sig] Analyzing algorithms...

Sun, 24 Feb 2002 17:45:23 -0500

[Jeffrey Elkner]
> After attending Pai Chou's wonderful presentation, "Algorithm Education
> in Python" at Python10, I got permission from the instructor of an
> algorithms course I am currently taking to do our programming
> assignments in Python (after first assuring him that they would not be
> difficult to read ;-)
>
> Our first assignment is to implement merge and heap sort and then to
> compare them empirically as to the number of assignments and comparisons
> made by each.
>
> I've written the sorts.  Does anyone have any suggestions as to the best
> way to do the empirical analysis?  Is there a better way than just
> creating counters and then littering my code with increment statements?

That's what I usually do, but recently a semi-maddening semi-wonderful
alternative became available via Tools/scripts/trace.py (in your Python
distribution).  The maddening part is that it's almost impossible to figure
out how to use it:  the docs (consisting of a --help message) talk about
many options, but there's no overview, and no explantion of what the
(non-option) arguments are supposed to be.

So here's just-about simplest-possible use explained via example.  Here's a
file with a poor sort function:

C:\Python22>type bubblesort.py
def bubble(a):
    n = len(a)
    changed = 1
    while changed:
        changed = 0
        for i in xrange(n-1):
            if a[i] > a[i+1]:
                a[i], a[i+1] = a[i+1], a[i]
                changed = 1
        n -= 1

import random
a = range(20)
random.shuffle(a)
print a
bubble(a)
print a

Now I run it, just to make sure it works:

C:\Python22>python bubblesort.py
[14, 9, 7, 5, 16, 13, 1, 10, 17, 3, 11, 19, 0, 18, 8, 2, 12, 4, 6, 15]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Now pass the file to trace.py with the "-c" option:

C:\Python22>python Tools/scripts/trace.py -c bubblesort.py
[8, 12, 0, 6, 11, 16, 4, 17, 7, 10, 18, 2, 1, 9, 15, 19, 13, 5, 3, 14]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

That took much longer to run, but it's worth it because of what happens
next:  it left behind a text file named bubblsort.cover, which *annotates*
each source line with the number of times it was executed:

C:\Python22>type bubblesort.cover
    2: def bubble(a):
    2:     n = len(a)
    1:     changed = 1
   18:     while changed:
   16:         changed = 0
  216:         for i in xrange(n-1):
  184:             if a[i] > a[i+1]:
   87:                 a[i], a[i+1] = a[i+1], a[i]
   87:                 changed = 1
   16:         n -= 1
    .
    1: import random
    1: a = range(20)
    1: random.shuffle(a)
    1: print a
    1: bubble(a)
    1: print a
    .
C:\Python22>

It *also* left behind a random.cover file in the Lib directory.  Stopping
(possibly) unwanted stuff like that is what some of the other options aim
at.

Note some odd things about the output; in typical use they don't really
matter, but if you're after exact counts you have to be aware of them:

What trace.py actually counts is the number of SET_LINENO opcodes executed.
Because "def" is an executable statement in Python, the count on "def
bubble(a)" is actually two:  it gets counted once for the time the def
statement is executed (which defines the function), and a second time for
*calling* the "bubble" function.

I can't account for why the count is also two on "n = lan(a)"; looks like a
bug.

The count on "while 1" is 1 larger than "it should be", because under the
covers a special SETUP_LOOP opcode is generated to get the while-loop
started, and is executed one per function call.  It gets counted against the
"while changed:" line because the closest preceding SET_LINENO opcode said
the SETUP_LOOP is due to line #4 in the function.  The count on the "for"
loop is also too large for the same reason, but worse because its hidden
SETUP_LOOP opcode is executed (and counted against the "for") once for each
iteration of the outer while-loop.  As a result, the count on "for" is 16 la
rger than expected (it *should* be 16 larger than the count on "if a[i] >
...", not 32 larger).

The counts on the "plain old lines" within the loops are on the nose,
though, and that's what's really important.

Good luck!