[New-bugs-announce] [issue46627] Regex hangs indefinitely

J.B. Langston report at bugs.python.org
Thu Feb 3 13:40:17 EST 2022

New submission from J.B. Langston <jblangston at datastax.com>:

The following code will cause Python's regex engine to hang apparently indefinitely: 

import re
message = "Flushed to [BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')] (1 sstables, 8,650MiB), biggest 8,650MiB, smallest 8,650MiB"
regex = re.compile(r"Flushed to \[(?P<sstables>[^]]+)+\] \((?P<sstable_count>[^ ]+) sstables, (?P<total_size>[^)]+)\), biggest (?P<biggest_size>[^,]+), smallest (?P<smallest_size>[^ ]+)( \((?P<duration>\d+)ms\))?")

This may be a case of exponential backtracking similar to #35915 or #30973. Both of these issues have been closed as Wont Fix, and I suspect my issue is similar. The use of commas for decimal points in the input string was not anticipated but happened due to localization of the logs that the message came from.  The regex works properly when the decimal point is a period.

I will try to rewrite my regex to address this specific issue, but it's hard to anticipate every possible input and craft a bulletproof regex, so something like this kind of thing can be used for a denial of service attack (intentional or not). In this case the regex was used in an automated import process and caused the process to back up for many hours before someone noticed.  Maybe a solution could be to add a timeout option to the regex engine so it will give up and throw an exception if the regex executes for longer than the configured timeout.

components: Regular Expressions
messages: 412450
nosy: ezio.melotti, jblangston, mrabarnett
priority: normal
severity: normal
status: open
title: Regex hangs indefinitely
type: behavior
versions: Python 3.8

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list