Help: non-capturing group RE fails

Robin Thomas robin.thomas at starmedia.net
Mon Feb 12 11:57:04 EST 2001


At 04:12 PM 2/12/01 +0000, roymath at yahoo.com wrote:

># ----------------------------------------------------------------------
>import re
>
>str = r' - <<ba>> @<<b>> <<c>> - '
>desiredResult = r" - ba @<<b>> c - "
>
># The following fails. produces: " -ba @<<b>>c - " (Note. whitespace is
>eaten up).
># Use non-capturing RE.
>r = re.compile(r'(?:[^@])<<(.*?)>>')

(?:[^@]) is not a "group" that can be referenced. It is an "assertion", or 
"lookahead assertion", that is part of the match but does not get a group 
reference. That's why it is consumed along with the other ungrouped parts 
of the match, like << and >>.

r'([^@])<<(.*?)>>' is the quickest fix, along with

>print re.sub(r, r"\1", str)

print re.sub(r, r"\1\2", str)

completes the quick fix.


># The following succeeds. produces: " - ba @<<b>> c - "
># Use a named group instead of non-capturing RE
>
>def mysub(x):
>     # print x.groups()
>     return x.group('ws') + x.group('pat')
>
>r = re.compile(r'(?P<ws>[^@])<<(?P<pat>.*?)>>')
>print re.sub(r, mysub, str)

Here, (?P<ws>[^@]) really is a group definition, not merely a lookahead 
assertion. That's why it behaves differently. The fact that you've named 
the groups instead of using the implicit group numbering isn't material.


--
Robin Thomas
Director, Platform Engineering
StarMedia Network, Inc.
robin.thomas at starmedia.net





More information about the Python-list mailing list