[Python-Dev] Behavior of matching backreferences

Barry Scott barry@barrys-emacs.org
Sat, 22 Jun 2002 20:39:19 +0100

I think the re module worked correctly.

If you write your expression without the ambiguity:

yours: "^(?P<a>a)?(?P=a)$"
re-1a: "^((?P<a>a)(?P=a))?$"
re-2a: "^(?P<a>a?)(?P=a)$"

your test data ebc will does not match either 'aa' or ''. Try removing
the $ so that it will match '' at the start of the string.

re-1b: "^((?P<a>a)(?P=a))?"
re-2b: "^(?P<a>a?)(?P=a)"

I think the re-2b form is the way to deal with the optional quotes.

I'm not sure a patch is needed for this.


-----Original Message-----
From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On
Behalf Of Gustavo Niemeyer
Sent: 21 June 2002 06:07
To: python-dev@python.org
Subject: [Python-Dev] Behavior of matching backreferences

Hi everyone!

I was studying the sre module, when I came up with the following
regular expression:


The (?P=a) matches with whatever was matched by the "a" group. If
"a" is optional and doesn't match, it seems to make sense that
(?P=a) becomes optional as well, instead of failing. Otherwise the
regular expression above will allways fail if the first group
fails, even being optional.

One could argue that to make it a valid regular expression, it should
become "^(?P<a>a)?(?P=a)?". But that's a different regular expression,
since it would match "a", while the regular expression above would
match "aa" or "", but not "a".

This kind of pattern is useful, for example, to match a string which
could be optionally surrounded by quotes, like shell variables. Here's
an example of such pattern: r"^(?P<a>')?((?:\\'|[^'])*)(?P=a)$".
This pattern matches "'a'", "\'a", "a\'a", "'a\'a'" and all such
variants, but not "'a", "a'", or "a'a".

I've submitted a patch to make this work to http://python.org/sf/571976

Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]

Python-Dev mailing list