Regular Expression Grouping

Michael J. Fromberger Michael.J.Fromberger at Clothing.Dartmouth.EDU
Sun Aug 12 19:48:21 CEST 2007

In article <1186939262.144073.182450 at>,
 linnewbie at wrote:

> Fairly new to this regex thing, so this might be very juvenile but
> important.
> I cannot understand and why 'c' constitutes a group here without being
> surrounded by "(" ,")" ?
> >>>import re
> >>> m = re.match("([abc])+", "abc")
> >>> m.groups()
> ('c',)
> Grateful for any clarity.


I believe your confusion arises from the placement of the "+" operator 
in your expression.  You wrote:


This means, in plain language, "one or more groups in which each group 
contains a string of one character from the set {a, b, c}."

Contrast this with what you probably intended, to wit:


The latter means, in plain language, "a single group containing a string 
of one or more characters from the set {a, b, c}."

In the former case, the greedy property of matching attempts to maximize 
the number of times the quantified expression is matched -- thus, you 
match the group three times, once for each character of "abc", and the 
result shows you only the last occurrence of the matching. 

Compare this with the following:

] import re
] m = re.match('([abc]+)', 'abc')
] m.groups()
=> ('abc',)

I suspect the latter is what you are after.


Michael J. Fromberger             | Lecturer, Dept. of Computer Science  | Dartmouth College, Hanover, NH, USA

More information about the Python-list mailing list