[Gustavo Niemeyer, on the behavior of re.compile("^(?P<a>a)?(?P=a)$").match("ebc").groups() ] Python and Perl work exactly the same way for the equivalent (but spellable in Perl) regexp ^(a)?\1$ matching the two strings a and aa and nothing else. That's what I expected. You didn't give a concrete example of what you think it should do instead. It may have been your intent to say that you believe the regexp *should* match the string ebc but you didn't really say so one way or the other. Regardless, neither Python nor Perl do match ebc in this case, and that's intended. The Rule, in vague English, is that a backreference matches the same text as was matched by the referenced group; if the referenced group didn't match any text, then the backreference can't match either. Note that whether the referenced group matched any text is a different question than whether the referenced group is *used* in the match. This is a subtle point I suspect you're missing.
Otherwise the regular expression above will allways fail if the first group fails,
Yes.
even being optional
There's no such beast as "an optional group". The ^(a) part *must* match or the entire regexp fails, period, regardless of whether or not backreferences appear later. The question mark following doesn't change this requirement. (a)? says 'a' must match but the overall pattern can choose to use this match or not That's why the regexp as a whole matches the string a The (a) part does match 'a', the ? chooses not to use this match, and then the backreference matches the 'a' that the first group matched. Study the output of this and it may be clearer: import re pat = re.compile(r"^((a)?)(\2)$") print pat.match('a').groups() print pat.match('aa').groups()
... while the regular expression above would match "aa" or "", but not "a".
As above, Python and Perl disagree with you: they match "aa" and "a" but not "".
... My intentions and the issue are clear enough.
Sorry, your intentions weren't clear to me. The issue is, though <wink>.