[Python-bugs-list] [ python-Bugs-753711 ] Infinite Loop in RE

SourceForge.net noreply@sourceforge.net
Sat, 14 Jun 2003 21:28:36 -0700


Bugs item #753711, was opened at 2003-06-12 20:27
Message generated for change (Comment added) made by bcannon
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=753711&group_id=5470

Category: Regular Expressions
Group: Python 2.2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthew Bogosian (mbogosian)
Assigned to: Fredrik Lundh (effbot)
Summary: Infinite Loop in RE

Initial Comment:
This *may* be similar to
<http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=439997>,
but I don't think the current behavior (infinite
loop/unbound execution) is correct.

Here's the python version:

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - 

#!/usr/bin/python

import re

pattern =
'^UPDATE\s+\w+\s+SET\s+locked_until\s*=\s*(\S+\s*)+,'

# This won't match (missing ',' at end
str = 'UPDATE arena_teams SET locked_until =
CAST(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)'

re.search(pattern, str, re.I)

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - 

When run, this script just pegs the CPU and hangs (I'm
running RedHat 8.0). Note: if I change the pattern
slightly it still doesn't work:

pattern = '^UPDATE\s+\w+\s+SET\s+locked_until\s*=\s*([^
=,]+\s*)+,'

It's not until the pattern actually matches that things
get better (with both the original and modified patterns):

# Pattern now matches (added a ',' at the end)
str = 'UPDATE arena_teams SET locked_until =
CAST(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),'

I tried doing the same thing in Perl:

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - 

#!/usr/bin/perl

'UPDATE arena_teams SET locked_until =
CAST(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)' =~
'/UPDATE\s+\w+\s+SET\s+locked_until\s*=\s*(\S+\s*)+,/i';

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - 

This of course runs just fine. Any ideas?

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2003-06-14 21:28

Message:
Logged In: YES 
user_id=357491

sjones is right about it taking exponential time.  A regex engine 
works with greedy quantifiers by sucking up all that it can and 
then trying the next part of the regex and continues until it fails 
or succeeds.  If it fails, though, it backtracks one regex character, 
gives up a character it claimed, and tries again.  It does this for a 
long time.

How long did you wait for the regex to complete?  I bet if you left 
it for a *very* long time it would complete.

----------------------------------------------------------------------

Comment By: Shannon Jones (sjones)
Date: 2003-06-13 14:42

Message:
Logged In: YES 
user_id=589306

If you change your regex and sample data to the following
simpler forms, you get the same problem:

pattern = '(\S+\s*)+,'
str = 'CAST(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)'

If you remove some letters from str, the search does finish
but it takes several seconds. I believe the problem is that
your regex takes exponential time to execute based on the
length of str. Here is a recent python-dev post that talks
about the problem:

http://mail.python.org/pipermail/python-dev/2003-May/035916.html

The link you provided in the bug report talks about this as
well. You can also do a web search for "regular expression
exponential" to see more information. I believe that any
regex engine (that has similar features to Python's engine)
will have some regular expressions that take exponential
time or space. Perl has some cases as well (search for
"exponential" in perldoc perlre).  This is just how regular
expressions are and IMHO does not indicate a bug in Python.

As far as how to fix the regular expression, I can't say.
I'm sure there is a way to "rephrase" what you want to get
it to work. Maybe try asking in the comp.lang.python newsgroup.

Good luck!


----------------------------------------------------------------------

Comment By: Shannon Jones (sjones)
Date: 2003-06-13 05:15

Message:
Logged In: YES 
user_id=589306

If you change your regex and sample data to the following
simpler forms, you get the same problem:

pattern = '(\S+\s*)+,'
str = 'CAST(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)'

If you remove some letters from str, the search does finish
but it takes several seconds. I believe the problem is that
your regex takes exponential time to execute based on the
length of str. Here is a recent python-dev post that talks
about the problem:

http://mail.python.org/pipermail/python-dev/2003-May/035916.html

The link you provided in the bug report talks about this as
well. You can also do a web search for "regular expression
exponential" to see more information. I believe that any
regex engine (that has similar features to Python's engine)
will have some regular expressions that take exponential
time or space. Perl has some cases as well (search for
"exponential" in perldoc perlre).  This is just how regular
expressions are and IMHO does not indicate a bug in Python.

As far as how to fix the regular expression, I can't say.
I'm sure there is a way to "rephrase" what you want to get
it to work. Maybe try asking in the comp.lang.python newsgroup.

Good luck!


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=753711&group_id=5470