catastrophic regexp, help!

Maric Michaud maric at aristote.info
Wed Jun 11 09:08:53 CEST 2008


Le Wednesday 11 June 2008 06:20:14 cirfu, vous avez écrit :
> pat = re.compile("(\w* *)*")
> this matches all sentences.
> if fed the string "are you crazy? i am" it will return "are you
> crazy".
>
> i want to find a in a big string a sentence containing Zlatan
> Ibrahimovic and some other text.
> ie return the first sentence containing the name Zlatan Ibrahimovic.
>
>
> patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
> should do this according to regexcoach but it seems to send my
> computer into 100%CPU-power and not closable.

This kind of regexp are quite often harmfull, while perfectly valid, if you 
take the time it will return, this check too many things to be practical.

Read it, sequentially to make it sensible : for each sequence of word + space, 
trying with the longest first, does the string 'zlatan' follow ?

"this is zlatan example.'
compare with 'this is zlatan example', 'z'=='.', false
compare with 'this is zlatan ', 'z'=='e', false
compare with 'this is zlatan', 'z'==' ', false
compare with 'this is ', "zlatan"=="zlatan", true
compare with 'this is', 'z'==' ', false
compare with 'this ', 'z'=='i', false
compare with 'this', 'z'==' ', false
...

ouch !

The most simple are your regex, better they are, two short regex are better 
then one big, etc...
Don't do premature optimization (especially with regexp).

In [161]: s="""pat = re.compile("(\w* *)*")
this matches all sentences.
if fed the string "are you crazy? i am" it will return "are you
crazy".
i want to find a in a big string a sentence containing Zlatan
Ibrahimovic and some other text.
ie return the first sentence containing the name Zlatan Ibrahimovic.
patzln = re.compile("(\w* *)* zlatan ibrahimovic (\w* *)*")
should do this according to regexcoach but it seems to send my
computer into 100%CPU-power and not closable.
"""

In [172]: list(e[0] for e in re.findall("((\w+\s*)+)", s, re.M) if 
re.findall('zlatan\s+ibrahimovic', e[0], re.I))
Out[172]:
['i want to find a in a big string a sentence containing Zlatan\nIbrahimovic 
and some other text',
 'ie return the first sentence containing the name Zlatan Ibrahimovic',
 'zlatan ibrahimovic ']



-- 
_____________

Maric Michaud



More information about the Python-list mailing list