more than 100 capturing groups in a regex
tim.peters at gmail.com
Thu Oct 27 18:57:59 CEST 2005
>> It's a conflict between python's syntax for regex back references
>> and octal number literals. Probably wasn't noticed until way too
>> ate, and now it will never change.
[skip at pobox.com]
> I suspect it comes from Perl, since Python's regular expression engine tries
> pretty hard to be compatible with Perl's, at least for the basics.
"No" to all the above <wink>. The limitation to 99 in backreference
notation was thoroughly discussed on the Python String-SIG at the
time, and it was deliberately not bug-compatible with the Perl of that
In the Perl of that time (no idea what's true now), e.g., \123 in a
regexp was an octal escape if it appeared before or within the 123rd
capturing group, but was a backreference to the 123rd capturing group
if it appeared after the 123rd capturing group. So, yes, two
different instances of "\123" in a single regexp could have different
meanings (meaning chr(83) in one place, and a backreference to group
123 in another, and there's no way to tell the difference without
counting the number of preceding capturing groups).
That's so horridly un-Pythonic that we drew the line there. Nobody
had a sane use case for more than 99 backreferences, so "who cares?"
Note that this isn't a reason for limiting the number of capturing
groups. It only accounts for why we didn't care that you couldn't
write a _backreference_ to a capturing group higher than number 99
using "\nnn" notation.
More information about the Python-list