Regular expression matching performance

Fri Jun 9 12:06:53 EDT 2000

I'm a relative newbie to python, and have been rewriting a fairly simple
perl program in python, mainly because I like python much better for
structural and engineering reasons. The pythonprogram runsmuch more slowly
then the perl program, and I set out to find out why. The  python profiler
revealed the program was spending most if its time in regular expression
matching, so I decided to find out if this was very much slower than perl. 

I do not want to start any flame wars, and forgive me if this does, but I
would like suggestions if I have missed any obvious tricks here or made
any obvious mistakes. I did try to streamline the python by taking the
suggestions from one of the performance notes, which is why the search
function is called indirectly using the 'findit' variable.

I wrote two simple test programs, one in perl and one in python. All they
do is read a text file and search for a regular expression, line by line.
Here are the programs: 

File match.pl:
--------------

#!/usr/local/bin/perl

while (<>) {
        /\s+J2\.2I\s+(\d+).+Loopback.+Alt\s+=\s+(\d+).+/;
}

File match.py:
--------------
#!/home/u8/jhz/tcl/Python-1.5.2c1/python -O

import re
import fileinput

j2_2_exp = re.compile(
        r"\s+J2\.2I\s+(\d+).+Loopback.+Alt\s+=\s+(\d+).+" )

findit=j2_2_exp.search

def runem():
        for line in fileinput.input():
                findit(line)

runem()

Here are the timings:

aplexus 1/42> time match.pl art.txt.n
0.27u 0.02s 0:00.29 100.0%

aplexus 1/43> time match.py art.txt.n
13.51u 0.06s 0:14.83 91.5%

I have a fairly large set of files to process and as it stands I can't
effectively use python. 

============================
John H. Zouck 
The Johns Hopkins University
Applied Physics Laboratory
============================