[Gustavo Niemeyer]
I still think it should, because otherwise the "^(a)?b\1$" can never be used, and this expression will become "^((a)?)b\1$" if more than one character is needed.
Is that a real concern? I mean that in the sense of whether you have an actual application requiring that some multi-character bracketing string either does or doesn't appear on both ends of a thing, and typing another set of parens is a burden. Both parts of that seem strained.
But since nobody agrees with me, and both languages are doing it that way, I give up. :-)
That's wise <wink>. It's not just Python and Perl, I expect you're going to find this in every careful regexp package. There's a painful discussion buried here: <http://standards.ieee.org/reading/ieee/interp/1003-2-92_int/pasc-1003.2-43. html> wherein the POSIX committee debated their own ambiguous wording about backreferences. Their specific example is: what should the regexp (in Python notation, not POSIX) ^((.)*\2#)* match in xx#yy## ? Your example is hiding in there, on the "third iteration of the outer loop". The official POSIX interpretation was that it should match just the first 6 characters, and not the trailing #, because in a third iteration of the outer subexpression, . would match nothing (as distinct from matching a null string) and hence \2 would match nothing. Python and Perl agree, which wouldn't surprise you if you first implemented a regexp engine with stinking backreferences <0.9 wink>. The distinction between "matched an empty string" and "didn't match anything" is night-&-day inside an engine, and people skating on the edge (meaning using backreferences at all!) quickly rely on the exact behavior this implies.
Could you please reject the patch at SF?
I'm not sure which one you mean, so on your authority I'm going to reject all patches at SF. Whew! This makes our job much easier <wink>.