[Tutor] Regular Expression question

Fri Apr 18 16:41:00 2003

On Fri, 18 Apr 2003, Scott Chapman wrote:

> On Friday 18 April 2003 11:46, Jay Dorsey wrote:
> > Scott Chapman wrote:
> > > Is it possible to make a regular expression that will match:
> > > '<html blah>' or '<html>'
> > > without having to make it into two complete expressions seperated by a
> > > pipe: r'<html[ \t].+?>|<html>'
> > >
> > > I want it to require a space or tab and at least one character before the
> > > closing bracket, after 'html', or just the closing bracket.
> > >
> > > Scott
> >
> > How about
> >
> > '<html([ \t][^>]+)?>'
> >
>
> Thanks for the reply.  After seeing these replies, it seems clear that
> you can use the grouping ()'s for more than just capturing a section for
> output.

Hi Scott,

Yes, the grouping parentheses are serving two functions: they are defining
a group for both capturing values and for aggregation.

There's another set of regular-expression parentheses that don't capture,
although they do still aggregate:

###
>>> regex = re.compile('(?:a+)(b+)(a+)')
>>> matchobj = regex.match('aaaaaabbbbaa')
>>> matchobj.group(1)
'bbbb'
>>> matchobj.group(2)
'aa'
###

Notice that the first set of parentheses, the ones that match again the
first set of "aaaa"'s, don't form a captured group.

[Wow, "captured group" sound like a term from Go or something... *grin*]

The non-grouping parentheses use the special form:

    (?:

The docs on them are a little sparse, but you can read more about them
here:

    http://www.python.org/doc/lib/re-syntax.html

Look for the words "non-grouping" and you should see them.  There's
actually a few Python-specific extensions to the regular-expression
grouping that are pretty cool: it's even possible to create "named"
groups so that we don't have to do things like column counting to keep
track of groups.

> I think I missed that in the docs I've been reading.  I wonder where all
> this works?  For instance, will it work on either side of a '|'?  I'll
> have to play with this further!

It sounds like you're getting interested in regular expressions.  You may
find AMK's "Regular Expression HOWTO" a slightly gentler introduction to
regular expressions:

    http://www.amk.ca/python/howto/regex/

Best of wishes to you!