[Python-Dev] Behavior of matching backreferences
Sun, 23 Jun 2002 14:28:53 -0400
[Gustavo Niemeyer, on the behavior of
Python and Perl work exactly the same way for the equivalent (but spellable
in Perl) regexp
matching the two strings
and nothing else. That's what I expected. You didn't give a concrete
example of what you think it should do instead. It may have been your
intent to say that you believe the regexp *should* match the string
but you didn't really say so one way or the other. Regardless, neither
Python nor Perl do match ebc in this case, and that's intended.
The Rule, in vague English, is that a backreference matches the same text as
was matched by the referenced group; if the referenced group didn't match
any text, then the backreference can't match either. Note that whether the
referenced group matched any text is a different question than whether the
referenced group is *used* in the match. This is a subtle point I suspect
> Otherwise the regular expression above will allways fail if the first
> group fails,
> even being optional
There's no such beast as "an optional group". The
part *must* match or the entire regexp fails, period, regardless of whether
or not backreferences appear later. The question mark following doesn't
change this requirement.
'a' must match
but the overall pattern can choose to use this match or not
That's why the regexp as a whole matches the string
part does match 'a', the ? chooses not to use this match, and then the
backreference matches the 'a' that the first group matched. Study the
output of this and it may be clearer:
pat = re.compile(r"^((a)?)(\2)$")
> while the regular expression above would match "aa" or "", but not "a".
As above, Python and Perl disagree with you: they match "aa" and "a" but
> My intentions and the issue are clear enough.
Sorry, your intentions weren't clear to me. The issue is, though <wink>.