Regular Expression Grouping

Michael J. Fromberger Michael.J.Fromberger at Clothing.Dartmouth.EDU
Sun Aug 12 19:48:21 CEST 2007


In article <1186939262.144073.182450 at z24g2000prh.googlegroups.com>,
 linnewbie at gmail.com wrote:

> Fairly new to this regex thing, so this might be very juvenile but
> important.
> 
> I cannot understand and why 'c' constitutes a group here without being
> surrounded by "(" ,")" ?
> 
> >>>import re
> >>> m = re.match("([abc])+", "abc")
> >>> m.groups()
> ('c',)
> 
> Grateful for any clarity.

Hello!

I believe your confusion arises from the placement of the "+" operator 
in your expression.  You wrote:

  '([abc])+'

This means, in plain language, "one or more groups in which each group 
contains a string of one character from the set {a, b, c}."

Contrast this with what you probably intended, to wit:

  '([abc]+)'

The latter means, in plain language, "a single group containing a string 
of one or more characters from the set {a, b, c}."

In the former case, the greedy property of matching attempts to maximize 
the number of times the quantified expression is matched -- thus, you 
match the group three times, once for each character of "abc", and the 
result shows you only the last occurrence of the matching. 

Compare this with the following:

] import re
] m = re.match('([abc]+)', 'abc')
] m.groups()
=> ('abc',)

I suspect the latter is what you are after.

Cheers,
-M

-- 
Michael J. Fromberger             | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/  | Dartmouth College, Hanover, NH, USA



More information about the Python-list mailing list