[New-bugs-announce] [issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed

William Budd report at bugs.python.org
Tue Jun 20 22:38:32 EDT 2017


New submission from William Budd:

pattern = re.compile('<div>(<p>.*?</p>)</div>', flags=re.DOTALL)

----------------------------------------------------------------

# This works as expected in the following case:

print(re.sub(pattern, '\\1',
             '<div><p>foo</p></div>\n'
             '<div><p>bar</p>123456789</div>\n'))

# which outputs:

<p>foo</p>
<div><p>bar</p>123456789</div>

----------------------------------------------------------------

# However, it does NOT work as I expect in this case:

print(re.sub(pattern, '\\1',
             '<div><p>foo</p>123456789</div>\n'
             '<div><p>bar</p></div>\n'))

# actual output:

<p>foo</p>123456789</div>
<div><p>bar</p>

# expected output:

<div><p>foo</p>123456789</div>
<p>bar</p>

----------------------------------------------------------------

It seems that pattern matching/substitution iterations only go haywire once the matching iteration immediately prior to it turned out not to be a match. Maybe some internal variable is not cleaned up properly in an edge(?) case triggered by the example above?

----------
components: Regular Expressions
messages: 296506
nosy: William Budd, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.sub substitution match group contains wrong value after unmatched pattern was processed
versions: Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30720>
_______________________________________


More information about the New-bugs-announce mailing list