[Python-bugs-list] [ python-Bugs-439997 ] infinite loop in re.match

noreply@sourceforge.net noreply@sourceforge.net
Tue, 10 Jul 2001 10:01:20 -0700


Bugs item #439997, was opened at 2001-07-10 02:44
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=439997&group_id=5470

Category: Regular Expressions
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Dieter Maurer (dmaurer)
>Assigned to: Fredrik Lundh (effbot)
Summary: infinite loop in re.match

Initial Comment:
Python 2.0.1 "re.match" enters infinite loop for
a regular expression that was carefully designed not
to lead to loops (an easy expression).

See attached file.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2001-07-10 10:01

Message:
Logged In: YES 
user_id=31435

Assigned to /F, but I don't see a bug here.  While your 
regexp should be speedy *when* applied to a string that 
actually matches (the regexp does not match the long string 
in your test case), you create an exponential number of 
backtracking possibilities by repeating \s* at both ends of 
the first repeated group (in effect, you're specifying an 
unbounded number of instances of the highly ambiguous 
\s*\s*).  So when the regexp doesn't match, it can be 
unboundedly slow.  This minor rewrite doesn't have that 
problem (it simply moves the first \s* "out of the loop"):

r='</[^>]*>\s*(<[?][^>]*>\s*)+<[^/?][^>]*>'

BTW, your test string doesn't match because it ends with

</list>

It *would* match if it ended with

<list>

It's unclear whether you expected it to match.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=439997&group_id=5470