[issue40027] re.sub inconsistency beginning with 3.7
Wayne Davison
report at bugs.python.org
Fri Mar 20 13:21:13 EDT 2020
New submission from Wayne Davison <wayne.davison at mariadb.com>:
There is an inconsistency in re.sub() when substituting at the end of a string using a prior match with a '*' qualifier: the substitution now occurs twice. For example:
txt = re.sub(r'\s*\Z', "\n", txt)
This should work like txt.rstrip() + "\n", but beginning in 3.7, the re.sub version now matches twice and changes any non-empty whitespace into "\n\n" instead of "\n". (If there is no trailing whitespace it only matches once.)
The bug is the same if '$' is used instead of '\Z', but it does not happen if an actual character is specified (e.g. a substitution of r'\s*x' does not substitute twice if x has preceding whitespace).
I tested 2.7.17, 3.6.9, 3.7.7, 3.8.2, and 3.9.0a4, and it starts to fail in 3.7.7 and beyond.
Attached is a test program.
----------
components: Regular Expressions
files: sub-bug.py
messages: 364688
nosy: Wayne Davison, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.sub inconsistency beginning with 3.7
type: behavior
versions: Python 3.7, Python 3.8, Python 3.9
Added file: https://bugs.python.org/file48990/sub-bug.py
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40027>
_______________________________________
More information about the Python-bugs-list
mailing list