Regular expression matching performance
John H. Zouck
jhz at aplexus.jhuapl.edu
Fri Jun 9 12:06:53 EDT 2000
I'm a relative newbie to python, and have been rewriting a fairly simple
perl program in python, mainly because I like python much better for
structural and engineering reasons. The pythonprogram runsmuch more slowly
then the perl program, and I set out to find out why. The python profiler
revealed the program was spending most if its time in regular expression
matching, so I decided to find out if this was very much slower than perl.
I do not want to start any flame wars, and forgive me if this does, but I
would like suggestions if I have missed any obvious tricks here or made
any obvious mistakes. I did try to streamline the python by taking the
suggestions from one of the performance notes, which is why the search
function is called indirectly using the 'findit' variable.
I wrote two simple test programs, one in perl and one in python. All they
do is read a text file and search for a regular expression, line by line.
Here are the programs:
File match.pl:
--------------
#!/usr/local/bin/perl
while (<>) {
/\s+J2\.2I\s+(\d+).+Loopback.+Alt\s+=\s+(\d+).+/;
}
File match.py:
--------------
#!/home/u8/jhz/tcl/Python-1.5.2c1/python -O
import re
import fileinput
j2_2_exp = re.compile(
r"\s+J2\.2I\s+(\d+).+Loopback.+Alt\s+=\s+(\d+).+" )
findit=j2_2_exp.search
def runem():
for line in fileinput.input():
findit(line)
runem()
Here are the timings:
aplexus 1/42> time match.pl art.txt.n
0.27u 0.02s 0:00.29 100.0%
aplexus 1/43> time match.py art.txt.n
13.51u 0.06s 0:14.83 91.5%
I have a fairly large set of files to process and as it stands I can't
effectively use python.
============================
John H. Zouck
The Johns Hopkins University
Applied Physics Laboratory
============================
More information about the Python-list
mailing list