Pathological regular expression

MRAB google at mrabarnett.plus.com
Sat Apr 11 11:38:58 EDT 2009


Steven D'Aprano wrote:
> On Thu, 09 Apr 2009 02:56:00 -0700, David Liang wrote:
> 
>> Hi all,
>> I'm having a weird problem with a regular expression (tested in 2.6 and
>> 3.0):
>>
>> Basically, any of these:
>> _re_comments = re.compile(r'^(([^\\]+|\\.|"([^"\\]+|\\.)*")*)#.*$')
>> _re_comments = re.compile(r'^(([^#]+|\\.|"([^"\\]+|\\.)*")*)#.*$')
>> _re_comments = re.compile(r'^(([^"]+|\\.|"([^"\\]+|\\.)*")*)#.*$')
>>
>> followed by for example,
>> line = r'~/.[m]ozilla/firefox/*.default/chrome'
>> print(_re_comments.sub(r'\1', line))
>>
>> ...hangs the interpreter.
> 
> 
> I can confirm the first one hangs the interpreter in Python 2.5 as well. 
> I haven't tested the other two.
> 
> To my mind, this is a bug in the RE engine. Is there any reason to not 
> treat it as a bug?
> 
It's not a bug but one of those pathological cases where there are a LOT
of possibilities to try (we're talking about exponential growth here).

Hmm, maybe the re module needs to let the user set a timeout control
that raises an exception if it takes longer than n seconds (it can
already be interrupted by ^C)...



More information about the Python-list mailing list