[Python-Dev] (no subject)
MRAB
python at mrabarnett.plus.com
Tue Dec 26 15:15:18 EST 2017
On 2017-12-26 07:01, Yogev Hendel wrote:
>
> I don't know if this is the right place to put this,
> but I've found the following lines of code results in an incredibly long
> processing time.
> Perhaps it might be of use to someone.
>
> /import re/
> /pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')/
> /pat.match('/t/a-aa-aa-aaaaa-aa-aa-aa-aa-aa-aa./')/
>
The pattern has a repeated repeat, which results in catastrophic
backtracking.
As an example, think about how the pattern (?:a+)+b would try to match
the string 'aaac'.
Match 'aaa', but not 'c'.
Match 'aa' and 'a', but not 'c'.
Match 'a' and 'aa', but not 'c'.
Match 'a' and 'a' and 'a', but not 'c'.
That's 4 failed attempts.
Now try match the string 'aaaac'.
Match 'aaaa', but not 'c'.
Match 'aaa' and 'a', but not 'c'.
Match 'aa' and 'aa', but not 'c'.
Match 'aa' and 'a a', but not 'c'.
Match 'a' and 'aaa', but not 'c'.
Match 'a' and 'aa' and 'a', but not 'c'.
Match 'a' and 'a aa', but not 'c'.
Match 'a' and 'a a' and 'a', but not 'c'.
That's 8 failed attempts.
Each additional 'a' in the string to match will double the number of
attempts.
Your pattern has (?:[%\w-]+/?)+, and the '/' is optional. The string has
a '.', which the pattern can't match, but it'll keep trying until it
finally fails.
If you add just 1 more 'a' or '-' to the string, it'll take twice as
long as it does now.
You need to think more carefully about how the pattern matches and what
it'll do when it doesn't match.
More information about the Python-Dev
mailing list