[Python-Dev] Behavior of matching backreferences
Fri, 21 Jun 2002 02:07:25 -0300
I was studying the sre module, when I came up with the following
The (?P=a) matches with whatever was matched by the "a" group. If
"a" is optional and doesn't match, it seems to make sense that
(?P=a) becomes optional as well, instead of failing. Otherwise the
regular expression above will allways fail if the first group
fails, even being optional.
One could argue that to make it a valid regular expression, it should
become "^(?P<a>a)?(?P=a)?". But that's a different regular expression,
since it would match "a", while the regular expression above would
match "aa" or "", but not "a".
This kind of pattern is useful, for example, to match a string which
could be optionally surrounded by quotes, like shell variables. Here's
an example of such pattern: r"^(?P<a>')?((?:\\'|[^'])*)(?P=a)$".
This pattern matches "'a'", "\'a", "a\'a", "'a\'a'" and all such
variants, but not "'a", "a'", or "a'a".
I've submitted a patch to make this work to http://python.org/sf/571976
[ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]