Regex speed

Andrew Dalke adalke at mindspring.com
Sat Oct 30 05:57:23 CEST 2004


Reinhold Birkenfeld wrote:
> re1 = re.compile(r"\s*<.*>\s*")
> re2 = re.compile(r".*\((.*)\).*")
> re3 = re.compile(r'^"(.*)"$')

BTW, do you want those or
  re1 = re.compile(r"\s*<[^>]*>\s*")
  re2 = re.compile(r".*\(([^)]*)\).*")

(For the last it doesn't make much difference.  There will only be
a single backtrack.)

For that matter, what about
  re2 = re.compile(r"\(([^)]*)\)")
then using re2.search instead of re2.match?

> So my question is: Why is the re module implemented in pure Python?
> Isn't it possible to integrate it into the core or rewrite it in C?

It isn't.  It's written in C.  I've not done timing tests
between Perl and Python's engines for a long time, so I can't
provide feedback on that aspect.

One thing about Python is that we tend to use regexps less
often than Perl.  For example, you may be able to use

def find_text_in_matching_pairs(text, start_c = "<", end_c = ">"):
   i = text.find(start_c)
   if i == -1:
     return None
   j = text.find(end_c, i)
   if j == -1:
     return None
   return text[i+i:j]

(If you instead what your original regexp says, use
def find_text_in_matching_pairs(text, start_c = "<", end_c = ">"):
   i = text.find(start_c)
   if i == -1:
     return None
   j = text.rfind(end_c)
   if j < i:  # includes 'j == -1' on find failure
     return None
   return text[i+1:j]


def find1(text):
   return find_text_in_matching_pairs(text, "<", ">")

def find2(text):
   return find_text_in_matching_pairs(text, "(", ")")

def find3(text):
   if text.startswith('"') and text.endswith('"'):
     return text[1:-1]
   return None

> Is there a Python interface for the PCRE library out there?

Python used to use PCRE instead of its current sre, back
in the 1.5 days.  Python 1.6/2.x switched to sre in part
because of the need for Unicode support.

The old benchmarks compared pcre and sre and found that
sre was faster.  See
   http://groups.google.com/groups?oi=djq&selm=an_588925502

Which versions of Python and Perl are you using for
the tests?  I know there has been some non-trivial work
for the 2.3 version of Python.

				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list