[New-bugs-announce] [issue46627] Regex hangs indefinitely
report at bugs.python.org
Thu Feb 3 13:40:17 EST 2022
New submission from J.B. Langston <jblangston at datastax.com>:
The following code will cause Python's regex engine to hang apparently indefinitely:
message = "Flushed to [BigTableReader(path='/data/cassandra/data/log/logEntry_202202-e68971800b2711ecaf770d5fa3f5ae87/md-112-big-Data.db')] (1 sstables, 8,650MiB), biggest 8,650MiB, smallest 8,650MiB"
regex = re.compile(r"Flushed to \[(?P<sstables>[^]]+)+\] \((?P<sstable_count>[^ ]+) sstables, (?P<total_size>[^)]+)\), biggest (?P<biggest_size>[^,]+), smallest (?P<smallest_size>[^ ]+)( \((?P<duration>\d+)ms\))?")
This may be a case of exponential backtracking similar to #35915 or #30973. Both of these issues have been closed as Wont Fix, and I suspect my issue is similar. The use of commas for decimal points in the input string was not anticipated but happened due to localization of the logs that the message came from. The regex works properly when the decimal point is a period.
I will try to rewrite my regex to address this specific issue, but it's hard to anticipate every possible input and craft a bulletproof regex, so something like this kind of thing can be used for a denial of service attack (intentional or not). In this case the regex was used in an automated import process and caused the process to back up for many hours before someone noticed. Maybe a solution could be to add a timeout option to the regex engine so it will give up and throw an exception if the regex executes for longer than the configured timeout.
components: Regular Expressions
nosy: ezio.melotti, jblangston, mrabarnett
title: Regex hangs indefinitely
versions: Python 3.8
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce