Working around a lack of 'goto' in python

Roy Smith roy at panix.com
Wed Mar 10 08:52:35 EST 2004


In article <9qh2i1-2up.ln1 at home.rogerbinns.com>,
 "Roger Binns" <rogerb at rogerbinns.com> wrote:

> Roy Smith wrote:
> > I find it interesting that many
> > people who come from a C++ background (especially those who migrated
> > from C) tend to dislike exceptions.  It must have been something in the
> > water when they were growing up :-)
> 
> It also affects former Java weenies.  As a Java programmer you expect
> your code to be run under a JIT (and not interpretted) and you are
> aware that exceptions are significantly more expensive than straight
> forward code.  This is less of an issue in Python as interpretters
> don't tend to have as expensive exceptions compared to "normal" code.

But, exceptions are things that (generally) don't happen very often.  
What difference does it make if something that doesn't happen very often 
is slow?

What really tends to surprise C++ types is that in Python, using 
exceptions is often FASTER than the alternative.  For example, if most 
accesses of a dictionary are for keys that are already there, it's 
faster to just try it and catch the occasional KeyError than to check 
before you try the access.

I wrote a little test program which reads a dictionary, generates the 
leading digram of each word (i.e. the first two letters) then builds a 
histogram of digram frequency.  I build the histogram 10 times to dilute 
out the time it takes to read the file in the first place.

Each of the 10 runs results in about 2 million dictionary accesses, of 
which about 500 generate KeyErrors.  I ran it using two algorithms; one 
catches the KeyErrors and deals with problem, the other uses has_key() 
to check first to avoid raising the exceptions in the first place.  
Here's the results (see full code listing below):

Roy-Smiths-Computer:play$ time ./digram.py -h > has
real    0m7.633s
user    0m6.830s
sys     0m0.100s

Roy-Smiths-Computer:play$ time ./digram.py -e > ex
real    0m5.642s
user    0m4.970s
sys     0m0.120s

The output files (has and ex) compare identical.  The has_key() version 
took 6.8 seconds, the exception catching version took 5.0.

I think what's really interesting about this is that even in C++, where 
exceptions are relatively costly, with a hit ratio of 500:2,000,000 
(0.025%), I bet the exception catching version would still be faster.


---------digram.py----------
#!/usr/bin/env python

import sys

def main ():
        digrams = []
        words = open ("/usr/share/dict/words")
        for word in words:
                digram = word.rstrip().lower()[0:2]
                digrams.append (digram)

        count = 10
        if sys.argv[1] == "-e":
                d = countEx (digrams, count)
        else:
                d = countHas (digrams, count)

        digrams = d.keys ()
        digrams.sort ()
        for digram in digrams:
                print "%s: %d" % (digram, d [digram])

def countEx (digrams, count):
        d = {}
        for i in range (count):
                for digram in digrams:
                        try:
                                d[digram] += 1
                        except KeyError:
                                d[digram] = 1
        return d

def countHas (digrams, count):
        d = {}
        for i in range (count):
                for digram in digrams:
                        if d.has_key (digram):
                                d[digram] += 1
                        else:
                                d[digram] = 1
        return d

if __name__ == "__main__":
        main ()
-----------------------------------



More information about the Python-list mailing list