Help: non-capturing group RE fails
hamish_lawson at yahoo.co.uk
Mon Feb 12 15:18:00 EST 2001
Robin Thomas wrote:
> (?:[^@]) is not a "group" that can be referenced. It is an
> "assertion", or "lookahead assertion", that is part of the match but
> does not get a group
I don't mean to pick holes in Robin's explanation, but I believe the
terminology used above isn't quite correct regarding assertions. I
point this out just so that Roy doesn't get confused when reading
reference material on REs - the concepts can be tricky enough!
Robin is correct in saying that (?:) can't be referenced since it's not
memorised; it's what's called a non-capturing group. However it does
consume the string, in that it causes the RE engine's 'pointer' to be
advanced through the string, and hence by definition isn't an
assertion. An assertion specifies a condition that must be fulfilled,
but doesn't causing the string to be consumed (i.e. doesn't advance the
pointer). (But to be fair to Robin, he makes the difference between
consuming and capturing clearer in another post.) The term 'capturing'
relates to whether the content is memorised for later reference
(whether by index or by name); 'consuming' to whether it causes the
pointer to be advanced.
Consider the code below.
r = re.compile("a(?:bd)+(?=c)(..)")
s = "HKabdbdGKMDabdbdbdcWFGT"
m = r.search(s)
This prints out 'cW'.
I want the '+' to apply to the pair 'bd' rather than just 'd', but
since I don't care about referring to the 'bd' pairs that are matched,
I use a non-capturing group '(?:bd)' rather than a capturing group
(bd); this means that the later '(..)' group in the pattern will be the
first referrable group.
The '(?=c)' in the pattern is a look-ahead assertion that specifies
that the 'bd' pairs must be followed by 'c'. But because it is an
assertion, it doesn't cause the string to be consumed. Hence the 'c' is
still available to be consumed by the '(..)' expression.
Do You Yahoo!?
Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk
or your free @yahoo.ie address at http://mail.yahoo.ie
More information about the Python-list