more than 100 capturing groups in a regex

Tim Peters tim.peters at gmail.com
Thu Oct 27 18:57:59 CEST 2005


[DH]
>> It's a conflict between python's syntax for regex back references
>> and octal number literals.  Probably wasn't noticed until way too
>> ate, and now it will never change.

[skip at pobox.com]
> I suspect it comes from Perl, since Python's regular expression engine tries
> pretty hard to be compatible with Perl's, at least for the basics.

"No" to all the above <wink>.  The limitation to 99 in backreference
notation was thoroughly discussed on the Python String-SIG at the
time, and it was deliberately not bug-compatible with the Perl of that
time.

In the Perl of that time (no idea what's true now), e.g., \123 in a
regexp was an octal escape if it appeared before or within the 123rd
capturing group, but was a backreference to the 123rd capturing group
if it appeared after the 123rd capturing group.  So, yes, two
different instances of "\123" in a single regexp could have different
meanings (meaning chr(83) in one place, and a backreference to group
123 in another, and there's no way to tell the difference without
counting the number of preceding capturing groups).

That's so horridly un-Pythonic that we drew the line there.  Nobody
had a sane use case for more than 99 backreferences, so "who cares?"
won.

Note that this isn't a reason for limiting the number of capturing
groups.  It only accounts for why we didn't care that you couldn't
write a _backreference_ to a capturing group higher than number 99
using "\nnn" notation.



More information about the Python-list mailing list