why huge speed difference btwn 1.52 and 2.1?

Neil Schemenauer nas at python.ca
Tue Jun 5 12:41:35 EDT 2001


Your problem is likely that you are re.search() in a loop.  In
Python 1.5.2 re.search() was a C function.  In 2.1 it is a Python
function.  Using the pcre.search() function should get your
program up to speed.  Alternatively, you could revise your
program.  It is quite inefficient.  This (untested) version
should be faster:

import re

states = {
'ALABAMA':'AL',
'ALASKA':'AK',
'ARIZONA':'AZ',
'WISCONSIN':'WI',
'WYOMING':'WY'}

def main():

    state_pat = "|".join(states.keys() + states.values())
    # not sure about the characters separating states, guessing
    state_re = re.compile(r"\b(%s)\b" % state_pat)
    
    for year in range(1994, 1998):
        f = open('states/USA%s.TXT' % year)
        counter = 1
        while 1:

            print year, counter
            counter = counter + 1

            #convert city name to allcaps (db outputs in allcaps)
            line = f.readline().upper()

            #check for EOF
            if not line:
                    break

            for state in state_re.findall(line):
                filename = states.get(state, state) # use abbrevation
                g = open('states/%s/%s.TXT' % (filename, year), "a")
                g.write(line)
                g.write("\n")
                g.close()
        f.close()

main()




More information about the Python-list mailing list