Why TERRIBLE performance of regular expressions in re module ?
rturpin at my-deja.com
rturpin at my-deja.com
Mon Oct 4 23:35:47 EDT 1999
While debugging a CGI script, I came across the following
strange behavior: a regular expression that seems to
require exponential time to recognize a string. For the
first twenty some-odd characters, it performed fine. I
added another character, and the time became .. well,
noticeable. Another character, and recognition required
30+ seconds. Another, 1:15. Another, 2:2x. Another,
just under 5 minutes. Another, 9 minutes, plus.
It doesn't take a genius to recognize this progression
and decide not to proceed. What puzzles me is this:
regular expressions have a well-recognized EFFICIENT
recognition algorithm: they are turned into state machines
that perform LINEARLY on the length of the input string.
I don't think I am doing anything strange. (Though I don't
rule it out). Has anyone seen this behavior, or does
anyone have an explanation for it?
Thanks!
Russell
Sent via Deja.com http://www.deja.com/
Before you buy.
More information about the Python-list
mailing list