[Tutor] Regular Expression question
Fri Apr 18 16:41:00 2003
On Fri, 18 Apr 2003, Scott Chapman wrote:
> On Friday 18 April 2003 11:46, Jay Dorsey wrote:
> > Scott Chapman wrote:
> > > Is it possible to make a regular expression that will match:
> > > '<html blah>' or '<html>'
> > > without having to make it into two complete expressions seperated by a
> > > pipe: r'<html[ \t].+?>|<html>'
> > >
> > > I want it to require a space or tab and at least one character before the
> > > closing bracket, after 'html', or just the closing bracket.
> > >
> > > Scott
> > How about
> > '<html([ \t][^>]+)?>'
> Thanks for the reply. After seeing these replies, it seems clear that
> you can use the grouping ()'s for more than just capturing a section for
Yes, the grouping parentheses are serving two functions: they are defining
a group for both capturing values and for aggregation.
There's another set of regular-expression parentheses that don't capture,
although they do still aggregate:
>>> regex = re.compile('(?:a+)(b+)(a+)')
>>> matchobj = regex.match('aaaaaabbbbaa')
Notice that the first set of parentheses, the ones that match again the
first set of "aaaa"'s, don't form a captured group.
[Wow, "captured group" sound like a term from Go or something... *grin*]
The non-grouping parentheses use the special form:
The docs on them are a little sparse, but you can read more about them
Look for the words "non-grouping" and you should see them. There's
actually a few Python-specific extensions to the regular-expression
grouping that are pretty cool: it's even possible to create "named"
groups so that we don't have to do things like column counting to keep
track of groups.
> I think I missed that in the docs I've been reading. I wonder where all
> this works? For instance, will it work on either side of a '|'? I'll
> have to play with this further!
It sounds like you're getting interested in regular expressions. You may
find AMK's "Regular Expression HOWTO" a slightly gentler introduction to
Best of wishes to you!