[Pythonmac-SIG] Pattern Matching Speeds?
Richard Gordon
maccgi@bellsouth.net
Wed, 15 Sep 1999 00:49:44 -0400
I finally got around to reworking a Perl script to Python that I
wrote to convert 2 digit yrs. into 4 digit yrs in text files for
database import. In MacPerl, this processes about 1200 test records
per second, while in MacPython, the rate is more like 800 per second
and that's kind of disappointing. Multiple comparison tests were done
on the same data and the same machine with both interpreters set to
10240K. The data is about 36,000 tab delimited records and each has
two dates in it- most need to be fixed but some don't.
I won't bore you with the perl code, but it's pretty simple and about
what you would expect. I am pasting in the python code below and
would appreciate it if anyone can spot something that might be
bogging this thing down. Thanks.
##############
import re, sys, string
infile = open("Conkie:Desktop Folder:2to4:fmptest.tab", "r")
outfile = open("Conkie:Desktop Folder:2to4:fixed.tab", "w"
sys.stdout = outfile
data = infile.read()
paragraphs = string.split(data, '\n')
matchstr = re.compile(r'(\b\d\d*/)(\d\d*/)(\d\d)\b')
def cent(matchobj):
centuryA = '19'
centuryB = '20'
if len(matchobj.group(1)) == 2:
month = '0'+matchobj.group(1)
else:
month = matchobj.group(1)
if len(matchobj.group(2)) == 2:
day = '0'+matchobj.group(2)
else:
day = matchobj.group(2)
if matchobj.group(3) > '89':
newDate = month+day+centuryA+matchobj.group(3)
else:
newDate = month+day+centuryB+matchobj.group(3)
return newDate
for paragraph in paragraphs:
if not paragraph:
break
else:
fixed_paragraph = matchstr.sub(cent, paragraph)
print fixed_paragraph
##############
Richard Gordon
--------------------
Gordon Consulting & Design
Database Design/Scripting Languages
mailto:richard@richardgordon.net
http://www.richardgordon.net
770.971.6887 (voice)
770.216.1829 (fax)