Why TERRIBLE performance of regular expressions in re module ?

rturpin at my-deja.com rturpin at my-deja.com
Mon Oct 4 23:35:47 EDT 1999


While debugging a CGI script, I came across the following
strange behavior: a regular expression that seems to
require exponential time to recognize a string.  For the
first twenty some-odd characters, it performed fine.  I
added another character, and the time became .. well,
noticeable.  Another character, and recognition required
30+ seconds.  Another, 1:15.  Another, 2:2x.  Another,
just under 5 minutes.  Another, 9 minutes, plus.

It doesn't take a genius to recognize this progression
and decide not to proceed.  What puzzles me is this:
regular expressions have a well-recognized EFFICIENT
recognition algorithm: they are turned into state machines
that perform LINEARLY on the length of the input string.

I don't think I am doing anything strange.  (Though I don't
rule it out).  Has anyone seen this behavior, or does
anyone have an explanation for it?

Thanks!

Russell


Sent via Deja.com http://www.deja.com/
Before you buy.




More information about the Python-list mailing list