Faster Regular Expressions

Thu Mar 9 14:16:28 EST 2000

"""

Food for thought...

My work depends on fast regular expressions but I also enjoy Python's
ease and speed of development.

I ran the following regular expression speed test (P166 workstation
running Linux).  The results are below.  Function "fastMatch" can be
six times (6x) faster.

It seems to me that Tatu Ylonen (apparent author of regexp.c) did his
job well and that the re/mo wrapper in re.py slows everything down.

-- Neill

[Discuss if you like.  Flames---as always---to /dev/null.  I post here
because these results should go in the searchable archive.]

"""

import re

TEST = '"coconuts NI! coconuts NI! coconuts NI! coconuts NI! coconuts"'
SLOWQUOTE = re.compile( r"\"(?:(?:\\.)|[^\"\\])*\"")
FASTQUOTE = re.compile( r"\"(?:(?:\\.)|[^\"\\])*\"").code.match

def slowMatch( pattern, string) :
  mo = pattern.match( string)
  return mo.group()

def fastMatch( pattern, string) :
  groups = pattern( string)
  start, end = groups[ 0]
  return string[ start:end]

def doit() :
  for trial in range( 1000) :
    x = slowMatch( SLOWQUOTE, TEST)
    x = fastMatch( FASTQUOTE, TEST)

import profile
out = 'tmp.prof'
profile.run( 'doit()', out)
import pstats
profObj = pstats.Stats( out)
profObj.sort_stats('cumulative').print_stats()

"""
Thu Mar  9 14:01:35 2000    tmp.prof

         5003 function calls in 2.510 CPU seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.010    0.010    2.510    2.510 profile:0(doit())
        1    0.000    0.000    2.500    2.500 <string>:1(?)
        1    0.220    0.220    2.500    2.500 testre.py:34(doit)
     1000    0.410    0.000    1.960    0.002 testre.py:25(slowMatch)
     1000    0.640    0.001    0.890    0.001 /usr/local/lib/python1.5/re.py:112(match)
     1000    0.660    0.001    0.660    0.001 /usr/local/lib/python1.5/re.py:335(group)
     1000    0.320    0.000    0.320    0.000 testre.py:29(fastMatch)
     1000    0.250    0.000    0.250    0.000 /usr/local/lib/python1.5/re.py:290(__init__)
        0    0.000             0.000          profile:0(profiler)

"""