[Python-bugs-list] [Bug #112521] re match becomes exponentially slow on larger inputs

noreply@sourceforge.net noreply@sourceforge.net
Tue, 22 Aug 2000 20:19:17 -0700


Bug #112521, was updated on 2000-Aug-22 18:46
Here is a current snapshot of the bug.

Project: Python
Category: Modules
Status: Closed
Resolution: Invalid
Bug Group: Not a Bug
Priority: 5
Summary: re match becomes exponentially slow on larger inputs

Details: The following code will run very slowly, using either python 1.5.2 or 1.6b1:

----8<----
import re

s = """

  arp               ARP commands
  class             Classifier commands
  dns               DNS commands
  email             Email commands
  event             User event commands
  frame             Frame Relay commands
  group             Group configuration commands
  help              On-Line help facility
  hl                Host list configuration commands
  hostdb            Host database commands
  image             Image commands
  links             Link commands
  look              Withdraw touch access (go back to look-only access)
  measure           Measurement commands
  mib               MIB commands
  net               Network statistics commands
  partition         Bandwidth partition commands
  policy            Policy commands
  reset             Reset the PacketShaper

"""

r = re.compile ( r"\n\n(\s+.+\n)+\n\n$" )

mo = r.match ( s )

if mo:
        print "groups = %s" % mo.groups()
else:
        print "no match"
----8<----

The above program takes 11 seconds to run on my system. FOr each line that I add in 's', the time DOUBLES. (And halves if I remove lines)


Follow-Ups:

Date: 2000-Aug-22 23:19
By: tim_one

Comment:
By its very nature, this regular expression will match quickly when it *does* match, but consume enormous amounts of time when it doesn't match.  Sorry, but "tough luck" is the only answer here.  O'Reilly publishes a very good book, "Mastering Regular Expressions", by Jeffrey Friedl, that explains why in detail.  Don't have time to write a book here <wink>, but, as a general hint, whenever you have nested repetition ("+" inside a "+" group, etc), unless you know exactly what you're doing you're going to cause *any* "NFA" regexp engine to take exponential time in the cases the regexp doesn't match.

Bring it up on comp.lang.python!  I'm sure someone will take the time to show you how to write the regexp in a way that works better.  Or buy the book.  Or don't use regexps for this problem to begin with.
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=112521&group_id=5470