[Python-bugs-list] [Bug #112521] re match becomes exponentially slow on larger inputs
Tue, 22 Aug 2000 20:19:17 -0700
Bug #112521, was updated on 2000-Aug-22 18:46
Here is a current snapshot of the bug.
Bug Group: Not a Bug
Summary: re match becomes exponentially slow on larger inputs
Details: The following code will run very slowly, using either python 1.5.2 or 1.6b1:
s = """
arp ARP commands
class Classifier commands
dns DNS commands
email Email commands
event User event commands
frame Frame Relay commands
group Group configuration commands
help On-Line help facility
hl Host list configuration commands
hostdb Host database commands
image Image commands
links Link commands
look Withdraw touch access (go back to look-only access)
measure Measurement commands
mib MIB commands
net Network statistics commands
partition Bandwidth partition commands
policy Policy commands
reset Reset the PacketShaper
r = re.compile ( r"\n\n(\s+.+\n)+\n\n$" )
mo = r.match ( s )
print "groups = %s" % mo.groups()
print "no match"
The above program takes 11 seconds to run on my system. FOr each line that I add in 's', the time DOUBLES. (And halves if I remove lines)
Date: 2000-Aug-22 23:19
By its very nature, this regular expression will match quickly when it *does* match, but consume enormous amounts of time when it doesn't match. Sorry, but "tough luck" is the only answer here. O'Reilly publishes a very good book, "Mastering Regular Expressions", by Jeffrey Friedl, that explains why in detail. Don't have time to write a book here <wink>, but, as a general hint, whenever you have nested repetition ("+" inside a "+" group, etc), unless you know exactly what you're doing you're going to cause *any* "NFA" regexp engine to take exponential time in the cases the regexp doesn't match.
Bring it up on comp.lang.python! I'm sure someone will take the time to show you how to write the regexp in a way that works better. Or buy the book. Or don't use regexps for this problem to begin with.
For detailed info, follow this link: