why huge speed difference btwn 1.52 and 2.1?
Neil Schemenauer
nas at python.ca
Tue Jun 5 12:41:35 EDT 2001
Your problem is likely that you are re.search() in a loop. In
Python 1.5.2 re.search() was a C function. In 2.1 it is a Python
function. Using the pcre.search() function should get your
program up to speed. Alternatively, you could revise your
program. It is quite inefficient. This (untested) version
should be faster:
import re
states = {
'ALABAMA':'AL',
'ALASKA':'AK',
'ARIZONA':'AZ',
'WISCONSIN':'WI',
'WYOMING':'WY'}
def main():
state_pat = "|".join(states.keys() + states.values())
# not sure about the characters separating states, guessing
state_re = re.compile(r"\b(%s)\b" % state_pat)
for year in range(1994, 1998):
f = open('states/USA%s.TXT' % year)
counter = 1
while 1:
print year, counter
counter = counter + 1
#convert city name to allcaps (db outputs in allcaps)
line = f.readline().upper()
#check for EOF
if not line:
break
for state in state_re.findall(line):
filename = states.get(state, state) # use abbrevation
g = open('states/%s/%s.TXT' % (filename, year), "a")
g.write(line)
g.write("\n")
g.close()
f.close()
main()
More information about the Python-list
mailing list