
On Mon, Feb 14, 2022 at 9:55 AM J.B. Langston <jblangston@datastax.com> wrote:
... more generally I think it would be good to have a timeout option that could be configurable when compiling the regex so that if the regex didn't complete within the specified timeframe, it would abort and throw an exception.
... The suggestion made to me on the bug was to read Mastering Regular Expressions and get better at writing regexes. I will take this advice, but this isn't really a reasonable solution to my problem for a few reasons. My use case is log parsing and I have a large number of regexes that run over many different log lines. With the volume of regexes I have, it's hard to make sure every regex has no potential problems, especially when the pathological behavior only occurs on certain inputs that may not have been anticipated when developing the regex.
A regex that's vulnerable to pathological behavior is a DoS attack waiting to happen. Especially when used for parsing log data (which might contain untrusted data). If possible, we should make it harder for people to shoot themselves in the feet. As an aside, pure regex's are not vulnerable, only extended regex. However, Python doesn't (that I know of) have a class that only accepts pure regex. --- Bruce