Help: non-capturing group RE fails
Robin Thomas
robin.thomas at starmedia.net
Mon Feb 12 11:57:04 EST 2001
At 04:12 PM 2/12/01 +0000, roymath at yahoo.com wrote:
># ----------------------------------------------------------------------
>import re
>
>str = r' - <<ba>> @<<b>> <<c>> - '
>desiredResult = r" - ba @<<b>> c - "
>
># The following fails. produces: " -ba @<<b>>c - " (Note. whitespace is
>eaten up).
># Use non-capturing RE.
>r = re.compile(r'(?:[^@])<<(.*?)>>')
(?:[^@]) is not a "group" that can be referenced. It is an "assertion", or
"lookahead assertion", that is part of the match but does not get a group
reference. That's why it is consumed along with the other ungrouped parts
of the match, like << and >>.
r'([^@])<<(.*?)>>' is the quickest fix, along with
>print re.sub(r, r"\1", str)
print re.sub(r, r"\1\2", str)
completes the quick fix.
># The following succeeds. produces: " - ba @<<b>> c - "
># Use a named group instead of non-capturing RE
>
>def mysub(x):
> # print x.groups()
> return x.group('ws') + x.group('pat')
>
>r = re.compile(r'(?P<ws>[^@])<<(?P<pat>.*?)>>')
>print re.sub(r, mysub, str)
Here, (?P<ws>[^@]) really is a group definition, not merely a lookahead
assertion. That's why it behaves differently. The fact that you've named
the groups instead of using the implicit group numbering isn't material.
--
Robin Thomas
Director, Platform Engineering
StarMedia Network, Inc.
robin.thomas at starmedia.net
More information about the Python-list
mailing list