[New-bugs-announce] [issue43726] regex module fails with a quantified backref but succeeds with repeated backref

David Ellsworth report at bugs.python.org
Sun Apr 4 03:52:01 EDT 2021


New submission from David Ellsworth <davidell at earthling.net>:

The regex /^((x*)\2{3}(?=\2$))*x$/ matches powers of 5 in unary, expressed as strings of "x" characters whose length is the number.

The following command line should print "1", but prints nothing:
python -c 'import regex; regex.match(r"^((x*)\2{3}(?=\2$))*x$", "x"*125) and print(1)'

However, this command does print "1":
python -c 'import regex; regex.match(r"^((x*)\2\2\2(?=\2$))*x$", "x"*125) and print(1)'

And so does this one:
python -c 'import re; re.match(r"^((x*)\2{3}(?=\2$))*x$", "x"*125) and print(1)'

The expression "\2\2\2" should behave exactly the same as "\2{3}", but in the "regex" module it does not.

Solving the following Code Golf Stack Exchange challenge is what led me to discover this bug:
https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792

----------
components: Regular Expressions
messages: 390175
nosy: Davidebyzero, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: regex module fails with a quantified backref but succeeds with repeated backref
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43726>
_______________________________________


More information about the New-bugs-announce mailing list