catastrophic regexp, help!

Peter Pearson ppearson at nowhere.invalid
Wed Jun 11 11:15:32 EDT 2008


On Tue, 10 Jun 2008 21:20:14 -0700 (PDT), cirfu <circularfunc at yahoo.se> wrote:
> pat = re.compile("(\w* *)*")
> this matches all sentences.
> if fed the string "are you crazy? i am" it will return "are you
> crazy".
>
> i want to find a in a big string a sentence containing Zlatan
> Ibrahimovic and some other text.
> ie return the first sentence containing the name Zlatan Ibrahimovic.
>
> patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
> should do this according to regexcoach but it seems to send my
> computer into 100%CPU-power and not closable.

Some regular expressions are exponentially expensive to
evaluate.  This web page

  http://www.cs.rice.edu/~scrosby/hash/

drew my attention to the (contrived) example "(x+x+)+y",
whose behavior (under Python 2.4.3) is this:

           seconds to scan ...
           -------------------
 # x's     xxx...xy    xxx...x
 -----     --------    -------
   10          0.0        0.0 
   11          0.0        0.0 
   12          0.0        0.0 
   13          0.0        0.01 
   14          0.0        0.0 
   15          0.0        0.01 
   16          0.0        0.03 
   17          0.0        0.04 
   18          0.0        0.09 
   19          0.0        0.16 
   20          0.0        0.33 
   21          0.0        0.65 
   22          0.0        1.3 
   23          0.0        2.58 
   24          0.0        5.18 
   25          0.0        10.27 
   26          0.0        20.41 

I'm no regular-expression guru, but your "(\w* *)*..." example does
bear a remarkable resemblance to "(x+x+)+y".


-- 
To email me, substitute nowhere->spamcop, invalid->net.



More information about the Python-list mailing list