[issue35859] Capture behavior depends on the order of an alternation

Ma Lin report at bugs.python.org
Sat Feb 9 06:39:34 EST 2019


Ma Lin <malincns at 163.com> added the comment:

For a capture group, state->mark[] array stores it's begin and end:
begin: state->mark[(group_number-1)*2]
end:   state->mark[(group_number-1)*2+1]

So state->mark[0] is the begin of the first capture group.
state->mark[1] is the end of the first capture group.

re.search(r'(ab|a)*?b', 'ab')
In this case, here is a simplified actions record:

01  MARK 0
02  "a":  first "a" in the pattern [SUCCESS]
03  BRANCH
04    "b": first "b" in the pattern [SUCCESS]
05    MARK 1
06    "b": second "b" in the pattern [FAIL]
07    try next (ab|a)*? [FAIL]
08      MARK 0
09      "a":  first "a" in the pattern [FAIL]
10  BRANCH: try next branch
11    "": the second branch [SUCCESS]
12    MARK 1
13    "b" [SUCCESS]: second "b" in the pattern

MARK_PUSH(lastmark) macro didn't protect MARK-0 if it was the only available mark, while the BRANCH op uses this macro to protect capture groups before trying a branch.

So capture group 1 is [MARK-0 at Line-08, MARK-1 at line-12), this is wrong. 
The correct capture group 1 should be [MARK-0 at Line-01, MARK-1 at line-12).

----------
versions: +Python 3.7, Python 3.8 -Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35859>
_______________________________________


More information about the Python-bugs-list mailing list