[issue35859] Capture behavior depends on the order of an alternation
Ma Lin
report at bugs.python.org
Sat Feb 9 06:39:34 EST 2019
Ma Lin <malincns at 163.com> added the comment:
For a capture group, state->mark[] array stores it's begin and end:
begin: state->mark[(group_number-1)*2]
end: state->mark[(group_number-1)*2+1]
So state->mark[0] is the begin of the first capture group.
state->mark[1] is the end of the first capture group.
re.search(r'(ab|a)*?b', 'ab')
In this case, here is a simplified actions record:
01 MARK 0
02 "a": first "a" in the pattern [SUCCESS]
03 BRANCH
04 "b": first "b" in the pattern [SUCCESS]
05 MARK 1
06 "b": second "b" in the pattern [FAIL]
07 try next (ab|a)*? [FAIL]
08 MARK 0
09 "a": first "a" in the pattern [FAIL]
10 BRANCH: try next branch
11 "": the second branch [SUCCESS]
12 MARK 1
13 "b" [SUCCESS]: second "b" in the pattern
MARK_PUSH(lastmark) macro didn't protect MARK-0 if it was the only available mark, while the BRANCH op uses this macro to protect capture groups before trying a branch.
So capture group 1 is [MARK-0 at Line-08, MARK-1 at line-12), this is wrong.
The correct capture group 1 should be [MARK-0 at Line-01, MARK-1 at line-12).
----------
versions: +Python 3.7, Python 3.8 -Python 3.5
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35859>
_______________________________________
More information about the Python-bugs-list
mailing list