catastrophic regexp, help!
Peter Pearson
ppearson at nowhere.invalid
Wed Jun 11 11:15:32 EDT 2008
On Tue, 10 Jun 2008 21:20:14 -0700 (PDT), cirfu <circularfunc at yahoo.se> wrote:
> pat = re.compile("(\w* *)*")
> this matches all sentences.
> if fed the string "are you crazy? i am" it will return "are you
> crazy".
>
> i want to find a in a big string a sentence containing Zlatan
> Ibrahimovic and some other text.
> ie return the first sentence containing the name Zlatan Ibrahimovic.
>
> patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
> should do this according to regexcoach but it seems to send my
> computer into 100%CPU-power and not closable.
Some regular expressions are exponentially expensive to
evaluate. This web page
http://www.cs.rice.edu/~scrosby/hash/
drew my attention to the (contrived) example "(x+x+)+y",
whose behavior (under Python 2.4.3) is this:
seconds to scan ...
-------------------
# x's xxx...xy xxx...x
----- -------- -------
10 0.0 0.0
11 0.0 0.0
12 0.0 0.0
13 0.0 0.01
14 0.0 0.0
15 0.0 0.01
16 0.0 0.03
17 0.0 0.04
18 0.0 0.09
19 0.0 0.16
20 0.0 0.33
21 0.0 0.65
22 0.0 1.3
23 0.0 2.58
24 0.0 5.18
25 0.0 10.27
26 0.0 20.41
I'm no regular-expression guru, but your "(\w* *)*..." example does
bear a remarkable resemblance to "(x+x+)+y".
--
To email me, substitute nowhere->spamcop, invalid->net.
More information about the Python-list
mailing list