Cancel or timeout a long running regular expression

Terry Reedy tjreedy at
Sat Sep 17 00:01:27 CEST 2011

On 9/16/2011 9:57 AM, Nobody wrote:
>>I wonder if there are any
>> heuristics for detecting exponential time re's.
> Exponential growth results from non-determinism, i.e. if there are
> multiple transitions for a given character from a given state.
> Common patterns include:
> 	...(a...)?a...
> 	...(a...)*a...
> 	...(a...)+a...
> with a choice between matching the "a" at the start of the bracketed
> pattern or the "a" following it.
> 	(xxxa...|xxxb...|xxxc...)
> with a choice between branches which cannot be resolved until more data
> has been read.
> For (as opposed to re.match):
> 	axxxa...
> When axxx has been read, a following "a" gives a choice between
> continuing the existing match with the second "a", or aborting and
> matching against the first "a". For patterns which contain many copies of
> the initial character, each copy creates another branch.
> Also, using back-references in a regexp [sic] can be a significant
> performance killer, as it rules out the use of a DFA (which, IIRC, Python
> doesn't do anyhow) and can require brute-forcing many combinations. A
> particularly bad case is:
> 	(a*)\1
> matching against "aaaaaaaa...".
> If the performance issue is with re.match/ rather than with
> re.compile, one option is to use ctypes to access libc's regexp functions.
> Those are likely to provide better searching throughput at the expense of
> potentially increased compilation time.

Now, can you write that as a heuristic *algorithm* def 

Terry Jan Reedy

More information about the Python-list mailing list